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PREFACE 


This book covers a preliminary course in Statistics, which 
most students of Education and Psychology have. to study as a 
part of their under-graduate, and in some cases, post-graduate 
degree programme 11 Indian universities, It is. meant as an 
introductory textbook for beginners, which focuses on the 
fundamental concepts and simple methods of Statistics with 
minimum of mathematical sophistication. Only those topics 
are covered which are.prescribed by most universities for such 
students, to avoid burdening them with a book from which they 
have to make selective reading. 

A word about the need for such a book. There arc, of 
course, numerous elementary textbooks in Statistics in the 
market, but most of them are meant for students of Statistics, 
or of Commerce and Economics, and do not adequately serve 
the purpose of the students of Education and Psychology, who 
are often forced to usc these books. On the other hand, there 
are a few books written by foreign authors specifically for the 
students of Education and Psychology, but most of them are 
expensive and cover several topics, which are not included in the 
prescribed courses of study for such students in our univer- 
sities. The students find only a few chapters of these books. 
relevant and useful. As such the need existed for a book which 
specifically meets their requirement, and in this book it has been 
our endeavour to meet this need, 

The book starts with a discussion of the nature and purpose 
of Statistics and the different types of measurement used in the 
study of human abilities and behaviour. It is followed by the 
methods of summarising the raw data in frequency tables. and 
presenting them in the form of graphs and statistical charts. 
Next, the measure of central tendency, dispersion and correla- 
tion are discussed in three separate chapters, with stress on con- 
cepts, methods of computation and interpretation of results. 
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Ample numerical examples and exercises have been provided to 
enable the students to acquire sufficient application of these 
basic statistical methods of summarising and interpreting the 
data. The last two chapters deal with Probability Binomial and 
Normal distributions. ‘Again the emphasis is on basic concepts 
and the use made of these distributions in dealing with educa- 
tional and psychological measurements. 

The book does not cover sampling theory, tests of signi- 
ficance and other relatively advanced topics, as these are includ- 
ed only in some post-graduate courses and are beyend the scope 
of the beginners who take a preliminary course in Statistics at 
the B.A. or B.Ed. level. It is, however, proposed to bring out 
а separate book covering these topics for the benefit of such 
students who continue to study Statistics at the post-graduate 
level. 

It is hoped that this book will be found useful by the stu- 
dents for whom it is Primarily written. Comments and sugges- 
‘tions for improvement.by the teachers as well as the students 
Who will use this book will be most welcome. 
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INTRODUCTION 


1. The nature and purpose of statistics 


11. Statistics in daily life 

In everyday life we deal with data or observations in which 
statistics is used. When we take our weights or measure of 
heights, when we scrutinise cricket test scores of various players, 
when we take note of the weather forecasts or when we 
compare the marks obtained in examinations by different 
students, we are mentally evaluating the various observations 
or measurements. Sometimes we compare a particular obser- 
vation with some average which we may be having in mind, 
for example, we do this when we say that a person is over- 
weight for his height or a given day's temperature is above 
average. 

Sometimes, we pass judgement on the observations and say 
that they аге in the normal range ог are unusually high or low. 
We often talk of average price of some commodity or price 
range in which it is available. We use phrases like average 
student in a class, pass percentage in a given examination and 
the range of marks in a given subject. In all this, we are using 
statistical concepts, even though they may be in a very rudi- 
mentary or crude form. There is an element of statistics in all 
these processes of mental evaluation of observations or 


measurements. 


1.2. Statistics as a subject | 
Statistics is a branch of science which deals with collection, 


analysis and interpretation of data obtained by conducting a 
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survey or an experimental study. The data so obtained are also 
called obseryations or measurements. It is usually not possible 
to derive any conclusion about the main features of data only 
from direct inspection of the observations. The purpose of 
Statistics is to provide a methodology for classification and 
describing the properties of data in a summary form and to 
provide us the techniques and rules for drawing valid inferences 
from the data. The science of statistics also helps in planning 
of surveys and research studies, since it offers methodology and 
techniques of sampling and designing experiments so that valid 
conclusions from the data can be drawn. In brief, statistics 
Provides us the know-how for collection and analysis of data 
scientifically. 

Coming to analysis of data, statistics plays two important 
Toles, (i) descriptive (ii) inferential. The descriptive statistics 
is concerned with describing or summarising the numerical 
Properties of data. The methodology of descriptive statistics 
includes classification. tabulation, graphical representation and 
calculation of certain indicators such as mean, range etc., which 
summarise certain important features of data. Inferential 
Statistics, which is also referred to as statistical ii 
concerned with derivation of scientific inference about generali- 
sation of results from the study of a few particular cases. 
Technically speaking, the methods of statistical inference help 
in generalising the results of a sample to the entire population 
from which the sample is drawn. The nature of inference is 
inductive in the sense that we make general statements from 
the study of a few cases. Inferential statistics provides us the 
tools of making such inductive inference scientific and rigorous. 
In such inference, it is presumed that the generalisation cannot 
be made with certainty. Some uncertainty is inevitable since 
in some, cases the inference drawn from the data of a sample 
Survey or an experiment can be wrong. However, the 
degree of uncertainty is itself measurable and one can make 
rigorous statements about the uncertainty (or the chance of 
being wrong) associated with a particular in 
certainty in inference is dealt-with by 
probability, which 


It is a branch of 


iference, is 


ference. This un- 
applying the theory of 
is the backbone of statistical inference. 
mathematical statistics that deals with 
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measurement of the extent of certainty of events whose 
occurrence depends on chance. | 

It may be noted that the word statistics is used not only to 
denote the subject of statistics, but has also other meanings. As 
a plural noun, it means the same thing as data. For example, 
educational statistics usually mean the data relating to the educa- 
tional system, that is, schools, students, teachers etc. When we 
talk of the statistics of agricultural production, manpower, prices 
etc., we use the word in the sense of data. Statisticians use the 
word statistics in yet another sense. It is the plural of the word 
Statistic, which stands for any quantity (e.g. mean, percentage 
etc.) calculated from sample observations. Thus sample mean is 
a statistic, which is used as an estimate of the population mean. 


1.3. Use of statistics in psychology and education 

Knowledge of statistics is required in almost all fields of 
study these days. Statistical techniques are widely used for 
research in physical and natural sciences (like physics, chemistry, 
biology), in applied sciences (like. geology, anthropology), in 
medical and engineering subjects and in social sciences (like 
sociology, economics, psychology etc.) In this book we are 
mainly concerned with the use of statistics in psychology and 
education. While the basic techniques of statistics are the same 
irrespective of the subject in which application is made, certain 
statistical techniques are particularly related to application in 
Specific branches (e.g. statistical quality control in engineering, 
factor analysis in education and psychology, econometrics in 
economics, etc.). Here our purpose is to study only certain basic 
techniques of descriptive statistics, which are useful in any 
branch of study, but we shall mainly stress their use in. psycho- 
logy and education by considering examples and illustrations 
from these fields. 

Knowledge of statistics is particularly useful to students of 
Psychology and education for the following reasons. 

(1) It helps in understanding the modern literature in these 
subjects. Most books and articles in research journals in these 
subjects use statistical terminology and present the results in 47 
statistical form, which cannot be understood without adequate 
knowledge of statistics. 
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(2) It helps in conducting research investigations for 
which sample survey or experimental approach has to be used. 
Knowledge of sample survey methods, design of experiments 
and statistical methods of data analysis is essential for the 
advanced students who have to conduct their own investi- 
gations. 

(3) It forms the basic of scientific approach to problems, in 
which inductive inference is commonly used. Students of 
psychology and education cannot afford to remain ignorant of 
the scientific method of approach to problem-solving in their 
disciplines. 

(4) It helps the professional psychologist, whether a coun- 
sellor, a guidance worker or a clinical psychologist, in doing 
his work efficiently, since in the course of his work he has to 
administer tests, interpret test scores and maintain a record of 
a number of cases (which constitute data requiring statistical 
analysis for proper interpretation). For all this the knowledge 
of statistics is essential. 

(5) It provides basic tools of data analysis to educationists 
who are engaged in planning and administration of an educa- 
tional system. They need to know statistics in order to study 
past trends of enrolment, to estimate teacher requirements 
plan new schools and for many other such purposes, 

(6) It helps the teachers and school administrators jn eva- 
luating the performance of students and schools. They have to 
know some statistics in order to deal with examination data, 
test scores of students and quantitative data used for different 
types of evaluation. 

(7) In psychology and education, quantitative methods are 
being increasingly used to study various phenomena, for which 
statistical techniques are indispensable. In psychophysics, one 
studies relationship between measurements obtained by instru- 
ments and by human judgement. Attempts are made to 
measure human ability (e.g. intelligence, scholastic aptitude, 
creativity, personality, interest, behaviour, attitude, etc.) by tests. 
Then there is the theory of learning, almost entirely based on 
statistical principles. There is considerable interest in predicting 
future performance of individuals (in any field) on the basis of 
their present evaluation with tests. In some problems in which 


, to 
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data оп а large number of variables are used to provide a 
simple description of relationship among the variables, factor 
analysis helps in identifying the factors which may be 
underlying a large number of variables. A knowledge of 
statistical techniques is required for understanding and problem- 
solving in all these situations, which are quite common in the 
fields of psychology and education. 


14. The Basic Concepts of Statistics 


1.4.1. Concept of Population 

Statistical methods help in describing the numerical 
properties of and drawing inferences about the nature of 
population from which data are obtained. The word population 
is commonly used to denote a group or an aggregate of people. 
For example, the population of India means all the people 
living in the country at. any specified time. In. statistics, the 
word population is used in a more general sensc. Here it refers 
to any well defined group or aggregate of people, animals, 
objects, materials, measurements or even ‘happenings’ of a 
particular type. We speak of populations of trees, soil, insects, 
manufactured articles, vital events like births and deaths, road 
accidents, etc. We also have a population of measurements, 
when repeated measurements are taken of the same object. 
Thus the word population denotes a group or aggregate of any 
type. 

Statisticians are concerned with the pro 
as a whole, and not of its individual members. For example, 
if we have data on the heights of a group of persons, statistical 
methods help a statistician in describing such properties of 
the group as average height, or the range within which all the 
height lie or the percentage of persons below a certain height, 
butan individual's height is of no interest to a statistician. 
Only the questions relating to the group as 4 whele are of 
statistical nature to which statistical techniques can be applied. 

There are two types of population, finite and infinite 
population. , In, a finite population, the number of individual 
members is finite. The students attending primary classes in 
Meerut city, illiterate adults living in rural areas of U.P., the 


perties of a group 
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table fans produced by a manufacturing firm in a given year 
are all examples of a finite population. On the other hand, 
the possible losses of a coin or the possible observations that 
can be taken in an experiment constitute infinite populations, since 
there is no limit to their number and at least theoretically, any 
number of losses can be made and any number of observations 
can be taken. Usually the infinite population are hypothetical in 
nature, that is, their existence can be only imagined or con- 
ceptualised. The finite population are real. But large finite 
populations are generally assumed to be infinite when it comes 
to applying methods of statistical inference which are based 
on the assumption of an infinite population. For studying the 
properties of populations we use statistical methods which help 
us in making statements about the properties in numerical terms. 
For example, while studying the population of students in 
a university, statements can be made about percentage of 
Students enrolled in different courses or about their average 
achievement in different examinations by using suitable statisti- 
cal methods. Often, for studying a large population, we resort 


to sampling, and generalise the properties of the sample to the 
entire population. 


1.4.2. Concept of Sample 

In the case of infinite or very large finite populations, it 
becomes practically impossible to collect data from all the 
members in order to study the population characteristics. For 
example, if we want to study the attitude of teachers towards 
their job, it would be very expensive and time-consuming to 
measure the attitude of all the teachers. It will be far cheaper 
and quicker to collect the data from a sample of teachers and 
to generalise the results of the sample study to the entire 
population, A sample is а fraction of a population drawn by 
using a suitable method so that it can be regarded as representa- 
tive of the entire population. Discussion of various sampling 
methods is not within the Scope of this book. Suffice it to say 
that when a sample is drawn with a suitable sampling method, 
one can study the Properties of the sample in numerical terms, 


and derive inference about the Correspoding properties of the 
population. 
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The inference is, of course, subject to some error. But we 
can make statements about the average margin of error. For 
example, we сап- зау what the average difference between 
sample mean and population mean is likely to be and what 
the limits are within which the unknown population mean 
would lie with a fairly high degree of certainty, on the basis of 
observed sample. In order that sample helps in estimating 
such error, it is necessary that it is selected by an appropriate 
method which involves choice of units (or individuals) by some 
random process rather than human judgement. In other words, 
the inclusion of a given unit of a population in the sample 
should depend on chance, rather than on one’s judgement in 
order to make the estimation of error feasible. 


1.4.3. Variables-—Parameters and Statistic 

We study the properties of a population, in terms of some 
variables. The term variable is used for some characteristic of 
a population on which the units (that is, individual members) of 
the population differ from one another. For example, age, sex, 
height, intelligence, ability to perform a given task, attitude 
towards something. etc., are variables on which individuals 
differ. We may be interested in any of these variables. While 
some variables are amenable to measurement оп a scale (c.g. 
height, weight), others are such that only differentiation in res- 
pect of a qualitative trait (e.g. skin colour, occupation ete.) 
is possible. Thus we have broadly two types of variables, 
quantilative and qualitative, but we shall discuss the different 
types of variables in more detail in the next chapter. 

The numerical quantities which characterise a population 
(in respect of any variable) are called parameters of the 
population. For example. if the variable is height, and 
measurements of height are taken for a large population of 
male adults of a tribe, the mean height can be regarded as a 
parameter. Similarly we can think of a parameter which gives a 
measure of variability in the heights of the persons belonging 
to the population. Usually all the important characteristics of 
a population can be specified in terms of a few parameters. 
Statistical methods help in describing the properties of a 
population in terms of its parameters, by providing suitable 
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estimates of the parameters from sampled observations. 

The term statistic is used to denote any quantity that is 
calculated from sample data. A Statistic is usually calculated to 
provide an estimate of some Population parameter. Thus the 
mean calculated from Sample is a statistic that serves as an 
estimate of the Parameter, population mean. The sample 
statistics provide information about the population, which is, 
in fact, the main purpose of statistical methods. „We use the 
term descriptive. statistics for the statistical procedures that 
describe the properties of a population when the relevant data 
'are available for all the units of the population, or, of a sample, 
in case the properties of the population are being studied by 
drawing a sample from it. The statistical procedures which are 
used for drawing inference about the population from sample 
data are covered under inferential statistics or sampling statistics. 
In the latter case, we calculate certain statistics from the sample 
and then use them to derive conclusions about the unknown 
population parameters. In the following chapters, we shall study 
the methods of descriptive statistics, which can be applied to 
any population or sample data. The methods of statistical 
inference will be presented in a second part of the book. 


1.5. Summation Notation 

In statistics one has to make use of certain formulae and 
computational procedures frequently for the analysis of avail- 
able data in order to describe its properties. In these formulae 
certain statistical notations are universally used and the most 
common of these is the summation Notation. It is represented 


by the Capital Greek letter 2. TE Хр Ка, 24, X; are m observa- 
tions then their sum 


X X2... X, 


n 
We can write 5 ж Гог х.х 
і=1 + 1-1 
... Xn. Sometimes if there is no confusion, we.can write it 
simply as Z х; or simply as Z x, 


is represented by х 
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The following results hold good about Z notation: 


(a) If all the observations arc individually multiplied by a 
constant c, then 


n - 
F (c xi) 9c xı +e х24-... + Xn 
1-1 
=e (xix. Xn) 
n 
-cZX 
1-1 
n 
(b) 2 GHD олс бугуй tate) 
125 
= (xı xou x) iyi e + Ye) 


n 
= 5х + 2 у 


1-1 i=] 
(c) If c is a constant, then 
n 
Хс=с+с+... с (п times) 
і=1 


==пс 


Using the above results, we also get 


n n 
(d) Е (x,—c) = Z x,—nc 
i=1 i=l 


and 
n n n 

(е) 2(х—с)?= 2 х2—2 с Х х-Епс? 
1-1 1-1 1-1 


Example 1.1. A variable x assumes the values xi—12, 
x= 17, ха==25, x,=8 and x,=13 respectively. Calculate (a) 2x; 
(b) 2x? (c) Е(хг-15). 
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Solution : 
(a) Since xi—12. x2=17, x3=25, х,=8 and x,=13 


Thercfore, 
xit Xp Xa} Xy X + Xs 
-124-17--25--8--13 
==75 


(b) Ухд-хї2Бх24-х3-рх2--хд2 
= 1224 17242524824 132 
= 144+289+625+64+ 169 
=1291 


(с) X(xi- 15)! (x, – 15)2-+ (x2 —15)2 - (x3 — 15)? (ха — 15)? 
+(%,— 15)? 
==(12—15)2+(17—15)2-+-(25—15) 
+(8—15)2+(13—15)2 
-(-3) + Q)- (102+ (—7)24+(— 2): 
=9+4+4100--49--4 
=166 


Note that in each case the subscript i is omitted and 
3 
Z is understood as X. 
i=1 


6 6 
Example 1.2. If X x;=4 and 2 х2= 10, calculate 
1-1 1231 


6 6 6 
(a) xp) (b 2 хи+2) (с) 2(х—1) 
С "леі 1-1 


Solution : 
$ 6 
(a) бек Z x;+(6x4) 
i= i=1 


—3X44-6x4—124-24 
=36 ` 
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6 6 6 
(b) 5 х(хг-2)-2 xP+2 2 Xi 
i=l 1-1 i=l 
6 6 
=F хр+2 Z х=10+2Х4 
1-1 1-1 
=18 


6 6 
(с) Z (xi—l)?=2 (x? + 1-2 х) 


i=1 i=l 


1-1 i= i= 
—104-6x1—2x4 
=8 


If desired, we can omit the subscript i and use Z in place 
6 
of 2. 
1-1 


Problem Set 1 


1. Describe various concepts of the word ‘Statistics’. 
Discuss the role of statistics in daily life. 

3. Discuss in detail the importance of statistics with special 
reference to psychology and education 

4. Define the following terms as used in statistics: 


(i) Population 
· (ii) Finite Population 
(iii) Infinite Population 
(iv) Sample 
(у) Variable and parameter 
(vi) Descriptive statistics 
(vii) Inferential statistics. 
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Write the terms in each of the following summation 
notations: 


5 4 4 
(a) Zx (ЫЬ) Z (yı+2) (c) 2 х;у; 
1-1 1-1 td 


6 


fal 
Express the following in summation notation: 


(а) xi Exi ха +...-Ех? 

(b) (х:4-уу)4-(Х,4-у,)-----4-(Х6-Ну6) 
(c) fx? H fx,24-... + fxg 

(d) a,by-+a2b2+...+a,b, 

(е) G,-2-2)-- (xa 4-2) 4- ... - (44-2) 

(f) x(x 4-4) -x(QG-F 4) +... + Х6(Х64-4). 


Two variables х and y assume the values X,==4, х;==3, 
Ху —1, x,2:5 and y,—3, y,—2, уз=4, y,—1 respec- 


tively. Calculate 


(а) Zx, (b) Ху, (c) Exy, (d) Ex (e) Худ, 


4 4 
If Ех--2 and Z х2=12, Find the values of 


4 4 4 
(а) (3+2) (Ы) ZxQ-x) (с) (+5) 
1 "n is 
i T i=1 1 
(d Z6(—2)y. 
i=] 
Answers 


(a) Xi c X2+%3+x,+x, f 
(b) 0):4:2)--04-2)--(у,-4-2)--(у, +2) 


Introduction 


(c) х,у,-Ехзуз-Ехзуз-Ехуу, 
(d) ata+a+a-+a--a 


10 6 6 
6. (а) Zx? (b) Z (х-Ру) (с) Zfx? 
i=1 1-1 r=] 
n n 6 
(d) Хай, (e) Е (х+2) (0 2 xi(xit 4) 
i=] i=] 1-1 


. (lr, (8),10 @ 19 '(d) SI 1020 
8. (a) 2 (b) 8 (c) 92 (d) 36 
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MEASUREMENTS, THE FREQUENCY 
DISTRIBUTION AND TABULATION 


2.1. Fundamentals of Measurements 

In every day life, we come across objects, living beings and 
phenomena, which vary in a number of ways, even though they 
may be belonging to the same general category or type. Thus 
Products manufactured by an industrial unit differ in dimen- 
sions and quality; animals of the same class may differ in speed; 
trees or plants in height or amount of fruits produced; people 
in respect of their age, sex, height, weight, intelligence. habits, 
personality, etc. In the field of education and psychology we 
study differences in ‘respect of the persons’ personality traits, 
abilities, aptitudes, etc. For example, college students of the 
same elass would differ in their performance on a particular 
test or on marks obtained in examinations. In all such cases, 
we are dealing with characteristics that vary or fluctuate in a 
rather unpredictable way. We find that, shape ог quality is a 
characteristic on which objects vary; speed is a characteristic on 
which animals vary; height is a characteristic on which trees 
vary and people vary in respect of various characteristics like 
age, sex, height, weight and personality traits etc. The charac- 
teristic on which individuals differ among themselves is called a 
variable. Thus speed, Shape, height. weight, age, sex, grades 
are variables in the above examples. In educational and 
psychological studies we often deal with variables relating to 
intellectual abilities. Now, it is the aim of every physical and 
behavioural science to study the nature of the variation in what- 
cver variable it is dealing with, and therefore, it is necessary to 
measure the extent and type of variation in a variable. Statistics 
is a branch of science which is concerned with the study of 
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variables that vary in unpredictable fashion and helps in pro- 
viding an understanding of the phenomena and objects which 
show such variations. 


2.2. Types of variables 
All variables can be broadly classified in the following 
categories— 


2.2.1] Quantitative variables. 
2.22. Qualitative variables. 


2.21. Quantitative variables 

Whenever the measurement of a variable is possible on a 
scale in some appropriate units, it is called a quantitative vari- 
able. On such variables. objects vary in magnitude and degree 
and the measurements indicate such variation. Examples of 
quantitative variables are: age, height. income and intellectual 
ability etc. Here age is measurable in years or months, height 
in cms., income in rupees and intellectual ability in forms of 
scores on a test. With quantitative variables, objects can be 
placed into ordered classes or categories, i.e., we can say that 
one class is higher than the other on a continuum. For ex- 
ample, students scoring 60 per cent or more marks may be 
ranked in Ist class, scoring between 45 per cent and 60 per cent 
in second and the others in 3rd class. Further, a student 
scoring highest may be ranked as No. 1, scoring next highest 
as Мо. 2and so on. Similarly, children may be ranked accord- 
ing to their height, weight, age or final status іп а race. These 
numbers 1, 2, 3 etc., used for ranking. the ordered classes or 
individuals are termed as ordinal. The ordinal numbers only 
have the property of order and not the other properties of 
number like additivity etc. Thus, the difference between ranks 
| and 2 is not ‘I’ but it is merely a difference between magni- 
tude or degree of the property which is also unspecified. On 
the other hand, measurements on a quantitative variable, called 
variates, provide actual description of the magnitude or degree 
of the property under consideration. The observed weights of 
persons and the income they earn per month, the scores of 50 
students in an examination, the number of rooms in houses 
etc., are few examples of such measurements. 
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Types of Quantitative Variables 
Quantitative variables can be further sub-divided into— 


2.2.1. (a) Discrete or discontinuous variable. 
2.2.1. (b) Continuous variable. 


2.2.1. (a) Discrete variable 

Discrete or discontinuous variable is one where the values of 
the variable differ from one another by definite amounts, i.e., 
these vary only by finite ‘jumps’ or breaks, For example, the 
number of persons in each family. the number of students 
Passing a test in different classes, the number of rooms in & 
house and the number of accidents among workers in a factory. 
Thus, by nature, we cannot fractionate a discrete variable: a 
family may have none, one or two children but cannot have a 
fraction of a child. Similarly a worker cannot have 2.4 
accidents. A class may have 20, 25 or 40 students but it cannot 
have 20.5, 25.2 or 40.8 students. Generally, measurements of a 
discrete variable are obtained by counting. 


2.21. (b) Continuous variable 

On the other hand, a continuous variable can theoretically 
assume all values within a certain interval and as such are 
divisible into smaller and smaller fractional units. Thus values 
of a continuous variable have no breaks or ‘jumps’. Age, 
distance, height, weight and intelligence quotient are some 
examples of a continuous variable If age is our variable, say x, 
then it can, theoretically at least, assume an infinite number 
of values between eight and nine years Thus, the measure of 
à continuous variable can never be exact. For example, intelli- 
gence quotient is usually taken as increasing by one unit on an 
ability scale. But with more and more sophisticated and refined 
scales of measurement we can perhaps get an 1.О. of 99.9 or 
even 99.92. Generally most of the variables studied in measur- 
ing mental and physical traits are continuous variables. 


2.22. Qualitative variable 
A qualitative variable shows variation in objects not in terms 
of magnitude, but in quality or kind. These qualities are called 
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attributes. A qualitative variable is unmeasurable with a scale 
and as such is unexpressible in magnitudes. Sex, nationality, 
occupation, religion, type of crime, marital status, literacy etc., 
are the examples of a qualitative variable. People vary accord- 
ing to sex as ‘male’ and ‘female’, according to nationality as 
‘American’, ‘French’, ‘Italian’ or ‘Indian’. Students in a college 
may be classified as belonging to ‘Science’, ‘Arts’ or ‘Commerce’ 
faculty. In this system of classification there is no natural 
Ordering in the classes. It is either purely arbitrary or done on 
the basis of the presence or absence of a particular attribute in 
an individual or object. Sometimes, however, we use letters or 
numbers in place of words to designate these classes so that 
they can be easily identified and distinguished from one 
another. Such numbers used to represent classes or categories 
are called nominal. Nominal numbers are used merely because 
they are simple and more convenient than names, otherwise, 
they are quite different from natural numbers as they do not 
have the property of ordering and additivity. 


2.3. Scale of measurement 
Quantitative variables can be classified another way in 


respect of scale of measurement into two classes— 


2.3.1. Ratio Scale 
2.3.2. Interval Scale 


2.3.1. Ratio Scale 

Ratio scales are those scales where the abso i 
is known. These measures are expressed in equal units. F 
we can say that a height of 90 centimetres is twice 
ale by which height 
ates complete 


lute zero point 
For 


example, 
the height of 45 centimetres because the sc 


is measured has an absolute zero point which indic 
These scales are used when we want to 


individuals in respect of the relative 
amount of some property they possess. If we do not know the 
zero point on the scale which represents complete absence of 
the property, then we cannot make statements in relative terms. 
Weight, length, time, speed etc., are some variables measured 


on ratio scales in physical sciences. АТ 


absence of height. 
compare two or more 
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2.3.2. Interval Scale 

Interval Scales do not have an absolute zero, For example, 
suppose we want to measure arithmetic ability and use a test for 
the purpose. Then we cannot say that a person who fails to solve 
even a single problem has zero arithmetic ability. The zero on 
our test, then, does not represent a complete absence of arith- 
metic ability. Consequently we cannot say that a person solving 
16 problems has twice the arithmetic ability of one solving 8 
problems correctly. These scales where the absolute zero point 
is unknown are termed as interval scales. For most of the 
psychological variables the scales are interval scales. The zero 
point on these scales are arbitrary points and there is no 


absolute zero representing a complete absence of the trait being 
measured. 


2.4. Classification or Grouping of Measurements 

Statistics is often called the science of collection, organiza- 
tion, presentation, analysis and interpretation of numerical 
data. Collection of data constitutes the first step in statistical 
investigation. An investigator usually makes measurements on 
a number of individuals or repeatedly on the sáme individual; 
for example, consider the data consisting of scores of 50 
students in a statistics paper in Table 2.1, the maximum marks 


allotted to the paper being 100. 
TABLE 2.1 


Scores of 50 students, ungrouped data 


50. 217 75 22 55 67 74 55 4 64 
61 71 25 40 25 54 64 37 88 44 
70,231 7517 81 45 63 49 43 #85 67 
31 68 45 38 59 15 57 29 6 50 
84 56 88 56 63 32 55 88 79 78 


In this example, the variate is the score achieved by a 
student in the paper. The data in the original form as shown 
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above, are called raw ог ungrouped data. Here the data are 
reported in raw form without making any attempt to group or 
classify them—with raw data, it is dificult to give a precise idea 
of the students’ performance in the test. This is so because 
the human mind cannot readily visualise the variation in a large 
multitude of separate observations and fails to form a concise 
picture of the main features of the data. It is, therefore, 
essential for an investigator to condense a mass of data into 
more and more comprehensible and assimilable form. Classi- 
fication or grouping of the collected data is a first step in this 
direction. Classification implies that related facts or observa- 
tions are grouped into classes or categories. Facts in one class 
differ from those of another class with respect to some charac- 
teristic. There are two basic types of classification corresponding 
to the two types of variables nameiy— 


2.4.1. Qualitative classification 
2.4.2. Quantitative classification. 


2.4.1. Qualitative classification 

In this type of classification data are classified on the basis 
of some attribute or quality. For example, if the students in a 
class are to be classified in respect of one attribute, say sex, 
then we can classify them into two classes namely that of males 
and females. Similarly, they can also be classified into 
‘employed’ or ‘unemployed’ on the basis of another attribute 
‘employment’. Thus, when the classification is done with 
respect to one attribute, which is dichotomous jn nature, two 
classes are formed, one possessing the attribute and the other 
not possessing the attribute. This type of classification is called 


simple or dichotomous classification. 
On the other hand, if we classify students simultaneously 


with respect to two attributes, €.g., Sex and employment. Then 
students are first classified with respect to ‘sex’ into ‘males’ 
and ‘females’. Each of thes¢ classes may then be further sub- 
divided into ‘employed’ and ‘unemployed’ on the basis of 
attribute ‘employment’ and as such students are classified into 


four classes, namely, 
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(i) Male employed 

(ii) Male unemployed 
(iii) Female employed 
(iv) Female unemployed 


and then the classification in tabular form may be put as shown 
in Table 2.2. 


Tanir 2.2 


Students classified with sex and employment 


Employment 


Sex Total 


Employed 


Unemployed 


Male 


Female 


Total 


Still, the classification may be further extended by con- 
sidering other attributes like marital status etc. The classi- 
fication, where two or more attributes are considered and 
several classes are formed, is called a manifold classification. 


2.42. Quantitative classification: Grouped data 
Frequency Tabulation 

A first step in the direction of putting variates in some 
ordered form is to arrange them in ascending or descending 
order of magnitude. The data are then said to be in an array. 
As апзехатрје, the data in Table 2.1 may be arranged in an 
ascending order of magnitude as shown in Table 2.3. 

But, still, it does not fully eliminate the confusion which 
may result from a large number of scores. Thus to -make our 


or 


2, 1 | беде Ka. теш е 4.2 Library 
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TABLE 2.3 


Data in Table 2.1 arranged in an array 


21 31 40 45 51 56 61 66 TA 79 
25 32 42 47 54 56 63 67 74 81 
25 35 43 49 55 57 63 67 Jo 84 
29 37 44 50 55 58 64 68 75 84 
3 38 45 50 55 59 64 70 78 88 


data more comprehensible we group or classify identical values ' 
of the variable into ordered class-intervals. Grouping our 
data into ordered classes and determining the number of 
observations in each of the classes, we form a frequency distri- 
bution. The tabular form of a frequency distribution gives us a 
frequency table. Though arranging the data in the form of an 
array gives some order to given observations, but it is not 
necessary to arrange the data in ascending or descending order 
to form frequency tables. 

To illustrate the construction of a frequency table, consider 
the data in Table 2.1, which represents the scores of 50 students 
in a statistical test. 

Here we first decide about the number of classes into which 
data are to be grouped This is done arbitrarily, although we 
are guided by the number of observations to be grouped. 
Ordinarily, the number of classes should be between 5 and 20. 
The number of classes would also depend on the number of 
observations —with larger number of observations one can have 
more classes. An advantage of a large number of classes is 
that it leads to better accuracy in the computation of statistical 
measures like mean etc., but if there are too many classes, it 
defeats the purpose of grouping. On the other hand, too few 
classes would be sacrificing too much information and would 
lead to more errors in calculations of mean etc. 

Another factor used for determining the number of classes 
is the size, width or range of the class usually called class-interyal 


ER Wey, 
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denoted by h. Class intervals should, whenever possible, be of 
uniform width. They are then easier to comprehend and make 
the calculations easier. However, there are exceptions to the 
rule. The school population, as an example, may be grouped 
in age classes 6-13, 14-17 and 18-22 of width 8, 4 and 5 respec- 
tively. These groups or classes correspond to certain levels of 
education c.g, elementary, secondary and higher education 
levels. Further, the width of the class should be a whole 
number conveniently divisible by 2, 3, 5, 10 or 20. Now, after 
deciding about class interval, say 10, we. calculate the range 
(highest score minus lowest scorc) of scores to be grouped. 
From data in Table 2.1, range of scores is К =88 -- 21 =67 
(Range be denoted by R). Now apply the following formula 
(2.1) to get the approximate number of classes which should 
expect to group given observations. 


Number of classes— R- — Range of scores (2.1) 
h class-interval 


In the case of data in Table 2.1, one should expect to have 
67/10—6.7 or 7 classes in whole numbers. Formula (2.1) can 
also be used to decide about class-interval, if we know the 
range of scores and the number of classes used in grouping, as 


Class interval 2h—- R 


No. of classes Ga 


Having determined the class interval, one must decide Where 
to start the classes. Since for data in Table 2.1, the lowest 
score is 21, so we might begin with 20 as it is common to let 
the first class start with a number which is a multiple of class 


interval. Now there are three methods of describing class limits 
for different classes. 


(a) Exclusive Method 
(b) Inclusive Method 
(c) True class Limits. 
(a) Exclusive Method 
In exclusive method of class formation, we add the class 
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interval 10 to the lower limit of the lowest class to find the 
upper. limit of the class as 20 + 10=30. Thus our lowest class 
becomes 10—20. The remaining class limits and classes are 
obtained by adding the class interval to each class limit until 
we reach the seventh class as 80 - 90, which contains the highest 
score of 88 in the data in Table 2.1. The seven classes thus 
formed are shown in Table 2.4. 


TABLE 2.4 


Exclusive classes 


Scores classes 


20—30 
30—40 
40.—50 
50 —60 
60—70 
70—80 
80—90 


Thus in exclusive method of class formation, classes are so 
formed that the upper limit of one class is the lower limit of 
the next class and, therefore, this method of classification 
ensures continuity between two successive classes which is 
essential for most of statistical calculations. However, students 
should note that in exclusive classes it is always presumed that 
the score or observation equal to upper limit of the class is 
exclusive, e.g, à score of 30 will be included in the class 
30— 40 and not in 20—30. A better way of expressing exclusive 


classes is given in Table 2.5. 
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TABLE 2.5 


Exclusive classes 
س“‎ аеаишнешкасб са: ЗЕ 


Scores classes Better expression for classes 
20—30 Scores of 20 or more but below 30 
30—40 in 80 ” 40 
40--50 ~ 407? T 50 
50 .60 “ 50 = 60 
60—70 w og " 70 
70 -80 x 0 " 80 
80 -90 » 80 " 90 


(b) Inclusive Method 

Unlike exclusive classes, inclusive classes include scores or 
Observations which are equal to upper limit of the class. In 
the formation of such classes we start with the lower limit 20 
of the scores for the first class, and then the lowest class is 
formed as 20—29 so as to include 10 scores (10 being class 
interval). These 10 scores are 20, 21, 22, ..., 29. Other six classes 
are obtained by adding the class interval to each class limit of 
the previous class until we Bet the highest class as 80— 89, 
Inclusive classes so formed are listed in Table 2.6. 


TABLE 2.6 
Inclusive classes 


Scores classes 


20—29 
30—39 
40—49 
50—59 
60—69 
70—79 
80-89 
А cou. coa E 
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Now, it is clear from the preceding discussion that exclusive 
method should be used when data are of continuous nature or 
have been measured in fractions of unit also. Inclusive way of 
forming classes may be preferred when measurements on 
variable are given in whole numbers. In view of this, inclusive 
classes are generally used in the classification of data related to 
education and psychology as in such cases, generally, we 
measure our variable in whole numbers or the measurements 
are converted to nearest whole number. 


(c) True or Actual class limits 

We observe that in inclusive method upper class limit is not 
equal to lower class limit of the next class and so there is no 
continuity between classes. However, for statistical calculations 
it is desirable that classes are continuous. To overcome this 
difficulty we assume that an observation or a score does not 
just represent a point on a continuous scale, but an interval of 
unit length of which the given score is the middle point. For 
example, a score of 20 upoa a test represents the interval 19.5 
to 20.5 ona continuum. Similarly, а score of 29, then, is 


TABLE 2.7 


Inclusive and true score classes 


a ss 


Inclusive True or Actual class limits 
scores classes 
= Be Gp E eae 
20-29 19.5—29.5 
30--39 29.5—29.5 
40--49 39.5--49.5 
50—59 49.5—59.5 
60—69 59.5—69.5 
10—79 69.5—79.5 


80—89 У 79.5—89.5 
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representable by the interval 28.5 to 29.5. Thus, the mathe- 
matical meaning of a score is an interval which extends from 
0.5 unit below to 0.5 unit above the face value of the score on 
а continuum. These class limits of a score are termed as /rue or 
‘actual class limits. Thus, the truc class limits for the class 


20 29 become 19 5—29.5. The true limits for the remaining 
classes are also given in Table 2.7. 


Finally, we count th» number of scores falling in cach class 
and record the appropriate number in frequency (denoted by f) 
column. The number of scores or observations falling in each 
class is termed as c/ass frequency. To facilitate counting these 
frequencies, prepare a tally chart with ‘tally bar’ column as 
shown in Table 2.8. 


TABLE 2.8 


Data from Table 2.1 grouped into a frequency table 
(Exclusive Method) 


Scores classes ‘Tally Bar’ Frequency 
(f) 

20—30 lili 4 
30—40 HI i 6 
40—50 THI ili 8 
50—60 пи ni || 12 
60—70 ший 9 
70—80 њи || 1 
80—90 "n 4 

ег ы 15 
Total 50=N 


In Table 2.8, four tally bars are marked for four. scores 
21, 25, 25 and 29 which lie within the class 20—30, similarly six 
bars are marked for six scores lying in the class 30—40 and so 
on. In this way, the data in Table 2.1 are arranged in the form 
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of a frequency distribution showing classes and the correspond- 
ing class frequencies. The total of all the frequencies is denoted 
by N. The frequency table in Table.2.8 represents the tabular 
form of this frequency distribution. 

Similarly, we can group the data in Table 2.1 in the form of 
a frequency distribution by using inclusive and true class limits. 
Such a frequency table is given below in Table 2.9. 


TABLE 2.9 


Data in Table 2.1 grouped into a frequency table 
(Inclusive Method) 


س 


Scores classes True ciass Class mid Frequency 
limits point (f) 
20—29 19 5 --29.5 1 24.5 | 4 
30—39 29.5— 39.5 34.5 6 
40-49 39.5—49.5 44.5 8 
50 —59 49.5—59.5 54.5 12 
60 —69 59.5—69.5 64.5 9 
70--79 69.5--79.5 74.5 7 
80-89 1 79.5-—89.5 84.5 4 
= == 
f Total N=50 


In the process of forming a frequency distribution, we 
observe that individual scores OF observations lose their 
identity when grouped into classes. We necd, therefore, a point 
to make some assumption about the location of all the scores 
or observations included in a class. On the assumption that 
grouped observations are more or less. uniformly distributed 
between class limits, a point mid-way between class limits can 
be chosen as representative of all the observations in that class. 
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This point which is called the mid-point of the class is calcu- 
lated by averaging the lower and upper class limits as follows: 


Lower class limit + Upper 


Mid-point of the class= Sess limit (2.3) 


for example, the mid-point of the class 20—29 will be (204-29)/2 
—24.5. Mid-points of the remaining classes are given in 
Table 2.9, 

Now, the steps in grouping a large set of data into a frequency 
distribution may be summarized as follows: 


Decide on the number of classes. 

2. Determine the range, ie., the difference between the 
highest and lowest observations in the data. 

3. Divide range by the number of classes to estimate 
approximate size of the interval. 

4. Find the lower class limit of the lowest class and add 
to it the class-interval to get the upper class limit. 

5. Obtain class-limits for the remaining classes by 


adding the class-interval to the limits of the previous 
class. 


6. Count number of frequencies in each class and check 
against the total number of observations. 


2.5. Cumulative Frequency Distribution 

Sometimes we may be interested to know the number of 
persons having scores above or below a specified score. The 
Principal, for example, may be interested in knowing how many 
students have scored above 60 or below 30. Such results can be 
casily obtained from a cumulative Jrequency distribution which 
is derived from a simple frequency distribution by merging 
successive class-intervals until all have been cumulated. Such a 


cumulation procedure may be of two types resulting in two 
types of cumulative frequencies, namely — 


2.5.1. ‘Less than’ Cumulative frequency. 
2:52, “Могс than’ cumulative frequency. 
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2.5.1. ‘Less than’ cumulative frequency distribution 

By adding the frequencies of all scores less than ‘true’ upper 
limit of a class, we get ‘less than’ cumulative frequency. Thus, 
for calculating these frequencies ‘true’ upper limit of a class is 
regarded as the reference point. ‘Less than’ ferquency table 
obtained for the grouped data of Table 2.9 is shown in Table 
2.10. 


2.5.2. 'More than' cumulative frequency distribution 

For ‘More than’ cumulative frequency ‘true’ lower limit 
of the class is taken as the reference point. By adding the 
frequencies corresponding to more than ‘true’ lower limit of a 
class, we get ‘more than’ cumulative frequency. ‘More than’ 
type cumulative frequency table corresponding to data in Table 
2.9 is also given in Table 2.10. 


2.6. Relative Frequency Distribution А 

Sometimes it is desirable to express the frequencies in a 
class in the form of proportion or percentage of the total 
frequency. This helps in comparing the distributions of frequen- 
cies for two sets of data, The relative frequency of a class is 
obtained by dividing the class frequency by the total frequency. 
A table listing relative frequencies in each class is accordingly 
called as relative frequency table. If we multiply each relative 
frequency by 100, we get a percentage distribution of the 
frequencies. The relative and percentage frequency table for the 
grouped data in Table 2.9 is given in Table 2.11 (p. 31). 


2.7. Tabulation , . 
Tabulation is the process of summarizing classified data in 
of a table, so that it is easily understood and an 


the form 
nformation. A 


investigator is quickly able to locate the desired ir 
is a systematic arrangement of classified data in columns 
and rows. Thus, a statistical table makes it possible for the 
investigator to present a huge mass of data in a detailed and 
orderly form It facilitates comparison and often reveals 
certain patterns in data which are otherwise not obvious. 
‘Classification and tabulation’, as a matter of fact, are not two 
distinct processes. Actually they go together. Before tabulation, 
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TABLE 2.11 


Relative and percentage frequency table 


Scores Frequency Relative Percentage 
classes f Frequency Frequency 
20—29 4 0.08 :08 <100=8 
30—39 6 0.12 12% 100=12 
40-49 8 016 .16 :100--16 
50—59 12 0.24 .24 100 =24 
60—69 9 0.18 18 100= 18 
70--79 y! 0.14 „М 100=14 
80 –89 4 0.08 .08 100=8 
ak 
Total N=50 1.00 100 


data are classified and then displayed under different columns 
and rows of a table. 


2.8. Preparing a Table 

The making of a compact table is itself an art. This should 
contain all the information needed within the smallest possible 
space, What the purpose of tabulation is and how the tabulated 
information is to be used, are the main points to be kept in 
mind while preparing for a statistical table. An ideal table 
should consist of the following main parts— 


2.8.1. Table number 2.8.5. Body of the table 
2.8.2, Title of the table 2.8.6. Footnotes 

2.8.3. Captions 2.8.7. Sources of data, 
2.8.4. Stubs 


2.8.1. Table number _ | 
A table should be numbered foreasy reference and identi- 


fication. This number, if possible, should be written in the 
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centre at the top of the table. Sometimes it is written just before 
the title of the table. 


2.8.2. Title 


A good table should have a clearly worded, brief but 
unambiguous title explaining the nature of data contained in 
the table. It should also state arrangement of data and the 
period covered. This should be placed centrally on the top of a 
table, just below the table number (or just after table number 
in the same line). 


2.8.3. Captions 

Caption in a table stands for brief and self-explanatory 
vertical columns. Captions may involve headings and sub-head- 
ings as Well. The units of data contained should also be given 


for each column. Usually, a relatively , less important and 
shorter classification should be tabulated in the columns. 


2.8.4. Stubs 


Stubs stands for brief and self-explanatory headings of 
horizontal rows. Normally, a relatively more important classi- 
fication is given in rows. Also a variable with a large number 
of classes is usually represented in rows. For example, rows 
may stand for scores classes and columns for data related to sex 
of students. Thus, there will be many rows for scores classes but 
only two columns for male and female students. 


2.8.5. Body 


The body of the table contains the numerical information or 
frequeney of Observations in the different cells. This arrange- 


ment of data is according to the description of captions and 
stubs. 


2.8.6. Footnotes 


Footnotes are given at the foot of the table for explanation 
of any fact or information included in the table which needs 
some explanation. Thus, they are meant for explaining or 


providing further details about the data, that have not been 
covered in title, captions and stubs. 


Measurements, the Frequency Distribution and Tabulation 33 


2.8.7. Sources of Data 

Lastly one should also mention the source of information 
from which data are taken. This may preferably include the 
name of the author, volume, page and the year of publication. 
This should also state whether the data contained in the table 
is of primary! or secondary? nature. 


A. model structure of a table is given below: 
Model structure of a Table 
Table Number 


4————— 


Title of the Table 


mE = eee ere 
Sub < Caption Headings Total 
Headings 
*—— Column Sub-headings——~> 
| 
| 
о 
E 
i 
5 
8 
5 === Воду и 
a 
2 
8 
ва 
| 
Total 


eee Rod cue ie ن‎ a i 
Footnotes: 1— 

2— 

Source Note: 1— 
2— 


]. The data gathered by actual observation, measurement, count and 
direct recording during the course of investigation is called primary 
data. 

2. Any data, detached from th 
own purpose or published b; 
gathered it, is called secondary. 


e original source, reprocessed for one's 
y agency other than which originally 
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29. Requirements of a good Table 

A good statistical table is not merely a careless grouping of 
columns and rows but should be such that it summarizes the 
total information in an easily accessible form in minimum pos- 
sible space. Thus while preparing a table, one must have a clear 
idea of the information to be presented, the facts to be 
compared and the points to be stressed. Though, there is no 


hard and fast rule for forming a table yet a few general points 
should be kept in mind: 


1. A table should be formed in keeping with «ће objects 

of a statistical enquiry. 

A table should be scientifically prepared (as discussed 

above) so that it is easily understandable. 

3. Atable should be formed so as to suit the size of the 

paper. But such an adjustment should not be at the cost 

, Of legibility. 

If the figures in the table are large, they should be 
suitably rounded or approximated. The method of 
approximation and units of measurements too should 
be specified. 

5. Rows and columns in a table should be numbered 
and certain figures to be stressed may be put in ‘box’ 
or ‘circle’ or in bold letters. 

6. The arrangement of rows and columns should be in a 
logical and systematic order. This arrangement may be 
alphabetical; chronological or according to size. 


The rows and columns are separated by single, double or 
thick lines to represent various classes and sub-classes 
used. The corresponding proportions or percentages 
should be given in adjoining rows and columns to 
enable comparison, A vertical expansion of the table is 
generally more convenient than the horizontal one. 

The averages or totals of different rows should be given 
at the right of the table and that of columns at the 


bottom of the table. Totals for every sub-class too 
Should be mentioned. 
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9. In casé it is not possible to accommodate all the infor- 
mation in a single table, it is better to ‘have two or 
more related tables. 


2.10. Type of Tables 
Tables can be classified according to their purpose, stage 


of enquiry, nature of data or number of characteristics used. 
On the basis of the number of characteristics, tables may be 


classified as follows: 


2.10.1. Simple or one-way table. 
2.10.2. Two-way table. 
2.10.3. Manifold table. 
2.10.1..Simple or one-way table 

A Simple or one-way table contains data of one characteris- 
tic only. A simple table is easy to construct as well as to 
follow. For example, the following blank Table 2.12 may be 
used to show the number of adults in different occupations in 


a locality. 


TABLE 2.12 


The number of adults in different occupations 
in a locality 


Occupations No. of adults 


Total 
ы, Ru MEME NR I LA S чы 
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2.10.2. Two-way Table 

A table which contains data on two characteristics is called a 
bivariate or two-way table. In such a case, therefore. either stub 
or caption is divided into two co-ordinate parts. In Table 2.12, 
as an example, the caption may be further divided in respect of 
‘sex’. This sub-division is given in the following two-way 


Table 2.13 which now contains two- characteristics namely, 
occupation and sex, 


TABLE 2.13 


The number of adults in a locality in respect of 
occupation and sex 


— 


Occupations 


No. of adults 


Total 


Female 


ee | 
Total j 


2.10.3. Manifold Table 

Thus, more and more complex tables can be formed by 
including other characteristics, For example, we may further 
classify the captions of Table 2.13 in respect of ‘marital status’, 
‘teligion’ and ‘socio-economic status’ etc. A table in which 


E 
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more than two characteristics of data are considered, is called 
a manifold table. For example, Table 2.14 shows three 
characteristics, namely, occupation, sex and marital status. 


TABLE 2.14 


The number of adults in a locality in respect of 
occupation, sex and marital status 


No. of adults 


Occupations Male Female Total 


Total 


Footnote: M stands for married and U for unmarried. 


Manifold tables, though complex, are good in practice as 


these enable full information to be pp A 
an analysis of all related facts. Still, as а normal pree 5 dia 

han four characteristics should be represe ЫН 
Ё ble ^ avoid confusion. Other related tables may be formed to 
a 


show the remaining characteristics. 
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Problem Set 2 
1. Define the following concepts with examples: 


(1) A variable and a variate 
(ii) Discrete variable 

(iii) Continuous variable 

(iv) Quantitative variable 

(v) Qualitative variable 

(vi) Ratio scale 

(vii) Interval scale 


2. Define classification and explain the various ways of 
classification. 

3. What do you understand by classification ? Discuss its 
importance in statistical analysis. 

4. Explain the following terms: 


(i) Class-interval 

(i) — Class-frequency 

(iii)  Class-limits 

(v) Class mid-point 

(v) Frequency distribution 

(vi) Frequency table 

(vii) Cumulative frequency table 
(viii) Relative frequency table 


5. Distinguish between: 


(i) Simple and manifold classification 
(ii) Discrete and continuous variable 
Gii) Inclusive and exclusive method of class formation. 
(iv) ‘More than’ and ‘less than’ cumulative frequency 
distributions, 


6. Explain the general principles of classification of data 
for forming a frequency distribution with a particular 


reference to the choice of class-interval and number of 
Classes. 
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7. Discuss the steps in the construction of a frequency 


- distribution from raw data. 


8. What isa statistical table ? Distinguish between one- 


way, two-way and high order tables with examples. 
9. State clearly the essentials of a good table. 
10. Discuss about the main parts of a model table. 


1l. The following are marks scored by 100 examinees in 


statistics out of a maximum of 100. 


51 44 8 «75° 00. 18445 14 “00° 04 
64 69 "72 Si 469 346 а 5227 434: 108 
58 «83 = 20) HO Зов 0085 38° O 
51.5 ise 17220932 1650 SO ENT — 58: 722 
бас зо у 75070831 24 46 US 00 “16 165 
95109568 20 GLO „6з - 547,09 104, * 132" HO 
74 74487 555? 11520. 660$, 55313 50 100.35 
25. 251 0538 33. 20 054, 4528 (48 ваа SO 
94 90 з 84 30 58 20 00 99 42 
79 33 38 60 61 36 10 34 02 80 


Prepare a frequency distribution table with a class- 
interval of 10 using (i) exclusive classes (ii) inclusive 


classes. 


12. Prepare ‘less than’ and ‘more than’ cumulative fre- 


quency tables from the following table 


Classes : 10—19 20—29 30—39 40—49 50—59 


Frequency : 5 10 16 6 
Also form a relative frequency table. 


13. Draw а blank table to show the numbers of candidates 
sex-wise, appearing in the first year, second year and 
third year examinations of a university in the faculties 


of Arts, Science and Commerce in a certain year. 


14, Draw a blank table to present the following information 


regarding the college students according to : 


(a) Faculty : Social Science, Commercial Sciences. 


` (b) Class : Under-graduate and Post-graduate classes, 


(c) Sex : Male and Female. 
(d) Year : 1970 and 1974. 
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Answers 
11. (i) Frequency distribution (ii) Frequency distribution 
(Exclusive Classes) (Inclusive Classes) 
Chass Frequency Class Frequency 
0--10 12 0-9 12 
10--20 6 10--19 6 
20—30 9 20—29 9 
30—40 18 30—39 18 
40—50 9 40--49 9 
50--60 18 50--59 18 
60—70 11 60.—69 11 
70-80 6 70—79 6 
80--90 6 80—89 6 
90—100 5 90—99 5 
TENSI = Ти > e. 
Total 100=N Total 100=N 
12. True Class Frequency | Lessthan More than | Relative 
limits eif; | c.f. frequency 
9.5—19.5 5 5 | 40 0.125 
19.5—29.5 10 15 | 35 0.250 
29.5—39.5 16 a | *2 0.400 
39.5—49.5 6 | 37 9 0.150 
49.5 —59.5 3 40 3 0.075 
Total N=40 | 


Measurements, the Frequency Distribution and Tabulation 41 


5 1201 O | 

Ss 

| oL 

57 Ё 

|-| 

2 явэд рє 

| 

о 

5 

E: 

"d шәл рос 

д T 

iad ЈЕ 

bo 

e JJ | 

8 эрэ 151 | 

Ф 

с 

Ou اا‎ ээд 
a 

n 

o 

3 "MUN “AD 

Я 

a 

2) 18101, 

[^] 

9 

а 

2 

3 avak рє 

2 

~ 

е] 

n 

4 

© w JeəÁ puc 

di © 

3 a 

> 

А 

В aeak 151 

& 

bn 

д 

н 

2 «aun ‘A21 

= 

a 

о 

= 2 

= |t TM 

B 2 2 5 E E 

B 4 8 E 5 

: = о E 

e 


Elementary Statistics in Psychology and Education 


42 


72209106 [BIDIALUWIOD =S 
тәзиә1ә$ [21205 —'S'S 


`әјепре13-)04 = суа 
”э1епреї8-зэригү--тсуү) :270N 


тој 


e AGERE | 


3 


ا 


тој, 


трга 


‘on 


18101, 


ојеш24 


ши! 


55819 Апе 


xas 


лтод 


дезА pue хәз ‘sse 


‘Aynoey оз Surpxooov sjuapnys з8ојјо2 SurpaveSoz чопешлојит SurMoqs әче, "FI 


3 


GRAPHICAL PRESENTATION 


3. Introduction 

In the previous chapter we have discussed classification and 
tabulation of ungrouped data that helped us in summarizing 
and presenting them in a systematic manner. However, this 
grouping of data does not always appeal to the common 
man. since too many figures are generally confusing and fail to 
convey an impression of the pattern of distribution of the 
frequencies. Sometimes even an expert investigator does not 
get an accurate conception of the shape of the frequency distri- 
bution from grouped data. On the other hand, pictorial or 
graphical presentation of data seems to be more appealing 
and more elfective in conveying an idea of the pattern of distri- 
bution. Such representation provides a good visual impression 
of the important features of entire mass of data. It is appealing 
to the eye, has a grea'er memorizing effect and facilitates 
comparison of several diffrent frequency distributions. When 
the data are depicted pictorially or graphically, we call it 
graphical presentation of the data. \ 

There are many different forms of graphical representation. 
Some of them which are used for representing a frequency table 
are: 


3.2. Histogram 

3.3. Frequency Polygon 

3.4. Frequency curve 

3.5. Cumulative frequency curve ог *Ogive'. 

ate axes; General principles 

ods of constructing a histogram, - 
e frequency curve or ogive, we 


3.1. The System of Coordin 
Before considering the meth 
frequency polygon OF cumulativ 
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discuss briefly the simple principles which are applicable to all 
graphic presentation of data. Graphing is done with reference 
to two perpendicular lines called coordinate axes. The vertical 
line (yy’) is called y axis and the horizontal (ХХ) as x axis. 
The point of their intersection is called the origin, Figure 3.1 
represents a system of coordinate axes. 


LY | Quadrant 
Il Quadrant Ц 
(=, +) р CER) 


IV Quadrant 


| 
(–.—) Ey (221) 


lli Quadrant 
Fig. 3.1. The System of coordinate axes. 


In Fig. 3.1, the origin, denoted by O, is the zero Point or 
reference point for the two axes. All distances measured along 
the x axis to the right of O are called Positive and those to the 
left of O negative. Similarly, distances measured along the y 
axis above the origin are positive, and below it negative. Four 
divisions resulting by the intersection of two axes are called 
quadrants. In. the upper right division or first quadrant, both 
x and y measures are Positive (-|-,+-). In the upper left division 
or second quadrant, x is negative and y positive (—,+). In the 
lower left division or third quadrant, both x and y measures are 
negative (—,—); while in the lower right or fourth quadrant, 


н а 
2 
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x is positive and y negative (--,—). These four quadrants have 
been marked in figure 3.1. The distance of a point from origin 
along x axis is called abscissa; and the distance of the point 
from origin on the y axis the ordinate. It is convention to write 
a point as (х, уй, where x; is the abscissa and y; is the 
ordinate. For example, the abscissa of the point P(7,—3) in 
Figure 3.1 is 7, and the ordinate,—3. Thus, for locating a point 
Q(5,3), we move from O five units to the right on x axis, and up 
from the origin three units on y axis. Where the perpendiculars 
to these points intersect, we locate the point Q (Fig. 3.1). Simi- 
larly, a point R( - 2, - 7) is located in the third quadrant. In this 
situation, we move from O two units to the left on x axis, and 
then down seven units from O on y axis. The point R is located 
where the perpendiculars to these points intersect (see, Fig 3.1). 
In like manner, any point whose coordinates are given can be 
located with reference to the coordinate axes. 


32. Histogram 


3.2.1. Histogram (equa! class-interval) 

Histogram is one of the most popular and widely used method 
of presenting a frequency distribution. A histogram is a set of 
rectangles whose areas are in proportion to class frequencies. 
Thus it is not only a graphical record of absolute class frequen- 
cies but also provides a comparison of class frequencies. 

A histogram may be best constructed on à graph paper, 
which is ruled with equally spaced horizontal and vertical lines. 
For example, let us see how the histogram for.the frequency 
distribution in Table 2.9. can be constructed. Here inclusive 
classes are given and, as а first step, these should be converted 
into classes with true or actual class limits as given in the second 
column of the Table 2.9. These true class limits are then plotted 
along with the horizontal axis (x-axis) and class frequencies on 
the vertical axis (y-axis) with the help of a suitable scale of 
measurement. Generally, à vacant class is also allowed at either 
end of the horizontal scale (See, Fig 3.2) This improves the 
readability of the graph and is also useful in the construction 
of a frequency polygon which will be discussed in the next 


section. 
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In order to give symmetry and balance to histogram (or any 
graphic presentation), one needs to be careful in the selection of 
unit distances to represent class limit on x-axis and the frequen- 
cies on y-axis. For representing these distances, the scales of 
measurement on two axes are so selected so that the height 
of the histogram (or any other graphic presentation) is 
approximately 75 per cent of its width. This ratio may, 
however, vary from 65-85 per cent for conveniently representing 
various frequency distributions with a balanced figure. 
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Fig 3.2. Histogram; Data in Table 2.9, 


Having marked off the two scales, a rectangle over each 
class is constructed in such a way that its area is proportional 
to the corresponding class frequency. This can be done very 
easily when we’ are given classes with equal class-intervals. In 
such a case equal width of the class is treated as one unit and 
rectangles over classes are so formed that their heights are 
proportional to class frequencies. A histogram representing the 
frequency distribution of scores in Table 2.9 is shown in 
figure 3.2. In this figure, the height ofthe rectangle formed 


\ 
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over class 19.5--29.5 is 4 units along the vertical scale (repre- 
senting frequency) and as such its area becomes 4x 1—4 square 
units, which is equal to the frequency of the class. Similarly, 
heights of other rectangles formed over consecutive classes are 
taken as 6, 8, 12, 9, 7 and 4 respectively so that their areas too 
are equal to corresponding class frequencies. 


3.22. Histogram (unequal class-interval) ; 

To take an example, let us arbitrarily group classes 30—39 
and 40—49 into one class as 30—49 in Table 2.9. Similarly, 
classes 60 – 69 and 70--79 are grouped as 60-79. Grouped 
classes and corresponding frequencies thus obtained are shown 
in Table 3.1. 


TABLE 3.1 


Data in Table 2.9 grouped into unequal 
class-intervals 


Scores True class Class Class Height of 
classes limits interval | | frequency rectangles 
سے‎ _| 
20—29 19.5- 29.5 10 4 4/1=4 
30-49 29.5- 495 20 14 14/27 
50—59 49.5--59.5 10 12 12/1=12 
60 - 79 59.5—79.5 20 16 16/2- 8 
80—89 ` 79.5—89.5 10 4 4/14 
! 
Total N-50 


In Table 3.1, the class-interval of second and fourth class is 
twice that of the rest of the classes. Thus. the frequencies in 
second and fourth classes are. not comparable with other classes. 
To establish this comparability, the frequencies in the larger 
classes should be halved or divided by two. Thus, before form- 
m for frequency distribution with unequal class- 


.ing histograt ' 
asses should be expressed as multiples of 


intervals, all larger cl 
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smaller classes; and then divide the corresponding class frequen- 
cies by these multiples. This division, then, gives the height of 
rectangies (as given in the last column of Table 3.1 . to be 
formed over given classes. Now, if the class-interval 10 of the 
class 19.5 -29.5 is taken as one unit then class-interval 20 of the 
class 29.5—49.5 will be equal to 2 units. Thus, a rectangle of 
height 7 formed over 29.5 — 49.5 will represent an area 7x2—14 
which is, in accordance with the principle of. histogram, equal 
to class frequency. Similarly, the height of the rectangle to be 
formed over class 59.5— 79.5 will be 8 so that its area 8 Х2--16 
may represent the class frequency. However, heights of other 
rectangles formed over classes of unit lengths will remain 
equal to corresponding class frequencies. The histogram obtain- 


ed for the frequency distribution of scores in Table 3.1 is given 
in figure 3.3. 
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Fig. 3.3. Histogram; Frequency Table 3.1 


We have seen that histogram is à graphical representation of 


а frequency distribution. It is supposed to represent all the 
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characteristics of a good table. Histogram too should be given 
a clear, brief and self-explanatory title. Index showing the scales 
of measurements on both axes should be given in the upper 
right corner of the figure and units of measurements should be 
mentioned along the axes so as to make the graph: easily 
readable. 


3.3. The Frequency Polygon 

A frequency polygon is another graphical representation of a 
frequency distribution in the form of a polygon superimposed 
on a histogram by joining with straight lines the mid-points of 
the top of the adjacent rectangles. The two end-points are also 
joined to the x—axis at the mid-points of the empty classes 
at each end of the frequency distribution. In a frequency 
polygon so constructed the area of the polygon is the same 
as that of a histogram. A frequency polygon, superimposed 
on histogram, for the data in Table 2.9 is given in Fig. 3.4. 
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Fig. 3.4. Frequency polygon superimposed on histogram—Data in 


Table 2.9. $ 
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A frequency polygon can also be drawn by joining the 
successive plotted points whose abscissae (distances along with 
x—axis) represent the mid-points of the classes and ordinates 
(distances along with y-axis) represent the corresponding class 


frequencies. In Table 2.9, the mid-points of the classes are 
ЗА ЗА и: 
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Fig 3.5. Frequency Polygon—Data in Table 2.9. 

In Fig. 3.5, one Observes that the first point (14.5) located on 
x-axis is a bit away from the origin O. In some distributions, 
this starting class or point may be farther away from the origin 
making it difficult to accommodate the rest of the classes or 
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points on the x-axis with a suitable scale of measurement. For 
locating classes or points away from origin, we may use a break 
in x-axis. The break, represented as (ff), indicates that the 
y-axis has been moved towards the starting point on x-axis for 
conveniently representing all the.classes or points on this axis. 
The concept will be more clear from the fig. 3.5 representing 
the frequency polygon for the frequency distribution in Table 
3:9. 

The main object of drawing a frequency polygon is to get a 
continuous frequency curve so that it may provide an idea about 
the shape of the frequency distribution. 


3.4. The Frequency curve 

When. generally, a smaller number of observations are 
grouped into classes of relatively large class-intervals, the 
resulting polygon is jagged in outline. Then it is reasonable to 
suppose that the progression of class frequencies can be 
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considerably smoothened by considering a larger number of 
observations grouped into a larger number of relatively small 
size classes. A frequency curve is this smooth curve that would 
presumably emerge if the class-interval is made as small as 
possible and the number of observations is very large. 

A frequency curve is obtained by approximating a histo- 
gram or a frequency polygon by a smooth free hand curve. 
The frequency curve for the frequency distribution in Table 2.9 
is shown in Fig. 3.6. 

To get a good frequency curve it is always desirable to 
draw a histogram and then a polygon for the given frequency 
distribution. Like histogram and frequency polygon, frequency 
curve is also an area diagram. Thus in drawing a frequency 
curve it should be kept in mind that the total area under the 


Curve is approximately equal to the area under a histogram or 
frequency: polygon. 


3.4.1. Forms of Frequency curves 

Frequency curves are graphical representation of frequency 
distributions which may vary greatly in shape depending on the 
pattern of distribution of frequencies. The following are some 


types of curves which we generally get while representing fre- 
quency distributions graphically: 


(a) Symmetrical bell shaped curve 
(b) Moderately asymmetrical or Skew curve 


(c) Extremely asymmetrical or J-shaped 
(d) U-shaped 


3.4.1. (a) Symmetrical bell shaped curve 

A frequency curve is said to be perfectly symmetrical if it , 
can be folded along a vertical line in such a way that the two 
halves of the figure coincide, as in Fig. 3.7. Suppose a test well 
suited to the abilities of students is administered. T 
of the scores obtained Will concentrate around the average 
score. A few students will score quite high, a few quite low and 
the majority will fall in Score classes near the middle of the 
scale. A frequency Polygon obtained for such a distribution, 
where there are few frequencies in classes on both ends and a 


hen, most 
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majority of them are' concentrated around the average score, 
can be very well approximated by a bell shaped symmetrical 
curve. 


II e UE 
Fig. 3.7. Asymmetrical Curve. 


3.4.1. (b) Moderately asymmetrical or Skew Curve 

A skew or moderately asymmetrical curve lacks in symmetry. 
In such a curve observations tend to accumulate at one or the 
other end of the curve. These curves are obtained for frequency 
distributions where majority of observations have a tendency to 
accumulate in lower or higher classes. For example, if a little 
easier test is given to а group of students, most of the scores 
will lie in higher score classes; and if a test is a little difficult, 
majority of scores, will fall in lower score classes. In both the 
cases, the scores tend to pile up in classes on one or the other 
end of the distribution and accordingly we get two types of 


skew curves namely. 


(i) Positively Skew Curve 
(ii) Negatively Skew Curve 


(i) Positively Skew Curve 

If a curve has a long tail on right side, it is called a positively 
skewed curve. Such a curve results from a frequency distribution 
where observations. concentrate in lower ‘classes as shown in 


Figure 3.8. 


(ii) Negatively Skew Curve А | 1 
On the other hand, a curve having a long tail on the left is 


called negatively skewed curve. This represents a frequency 


distribution in which comparatively more scores fall in higher 
classes. The shape of a negatively skewed curve is also given 
in Figure 3.8. 


GRAN 


Positively skewed curve Negatively skewed curve 
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Fig. 3.8. Skewed curves, 


3.4.1. (c) Extremely asymmetrical or J-Shaped Curve 

In an extremely asymmetrical curve a variate has the 
tendency to have maximum frequency on one end and the 
minimum on the other end. The shapes of the curves arising 
for such distributions are like the letter J, and as such these are 
also called J-shaped curves as given in Fig. 3.9. Suppose a very 
easy test is administered to a group of persons, then majority of 
the group will score in the highest class of scores and a 
continuously declining number will score in the lower class of 
scores and a continuously declining number will'score in the 
lower class of scores. In such a situation, we may get a 
negatively skewed J-shaped curve as shown in Figure 3.9. 


2516" 


Negatively J shaped Positively J-shaped 


Fig. 3.9. J-shaped curves. 


3.4.1. (d) U-shaped Curve 
Suppose, in a test administration, persons either score in 
lowest class of scores or in highest class of scores and a very 
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few score in intermediate classes. The frequency distribution of 
scores, in such a situation, gives rise to a U-shaped curve as 
shown in Figure 3.10 Р 


Fig. 3.10. U-shaped curve. 


Such a distribution of scores may arise when a test is not 
well suited to the abilities of the group of persons. The group 


may be consisting of persons with distinct abilities—some very 


high and the other very low. 


3.5. Cumulative Frequency curve or ‘Ogive’ 

A cumulative frequency curve or ‘Ogive’ is 
cumulative frequency distribution. The ‘less 
lative frequency tables in Table 2.10 
f students scoring below the upper 
class limits or above the lower class limits, But these tables fail 
to provide the number of students scoring below or above the 
intermediate score within a class. For example, Table 2.10 
shows that 10 students have scored below 39.5 and 46 have 
scored above 29.5, but it fails to give the number of students 
scoring below or above 35 which is an intermediate score in 
class 29.5—39.5. We may further need, for example, the number 
of students in various grades, say A, B or C where grade A is 
given to a score of 65 or more, B for above 45 and below 65 
and C for the rest. Though, this sort of information could be 
obtained by using arithmetic interpolation between class limits, 
yet it can be obtained from ogives with less effort and sufficient 
accuracy. Ogives are also drawn for determining median, a 


а graphical 


representation of a 
than’ or ‘more than’ cumu 
gives either the number o 
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measure of central tendency, as will be discussed in the next 
chapter. 

There are two forms of ogive depending on the types of 
cumulative frequencics, namely — 


3.5.1 Less than ‘ogive’ 
3.5.2 More than ‘ogive’. 


3.5.1. Less than ‘Ogive’ 

In the construction of ‘ogives’ also, we use true class limits. 
In the formation of less than ‘ogive’, upper true limits of the 
classes are plotted along the horizontal axis and the correspond- 
ing less than cumulative frequencies are marked along the verti- 
cal axis. The points so obtained are then joined by a free hand 
smooth curve to get less than ‘ogive’ as given in Fig. 3.11 for the 
cumulative frequency distribution in Table 2.10. 
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Fig. 3.11. Ogive Less Than; Cumulative frequency Table 2.10. 
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The ogive in Fig. 3.11 can now be used to get number of 
students scoring below a given score. For example suppose the 
number of students scoring below 35 is to be estimated. Then 
35 is first marked on horizontal axis as a point say A and a 
vertical line drawn through this point cuts the ogive at some 
point, say B. This cutting point, is then projected on vertical 
axis to a point C. The point C, when measured on vertical 
axis, gives the required number of students as [^ 


3.5.2. More than ‘Ogive’ 
For more than ‘ogive’, true lower limits of the classes are 


plotted along the horizontal axis and the corresponding more 
than cumulative frequencies are marked along the vertical axis. 
The points are then joined by a smooth free hand curve to 
give more than ‘ogive’ as shown in Fig. 3.12. The cumulative 
frequency distribution in Table 2.10 has been used for the 


purpose. 
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Fig. 3.12. Ogive More than, Cumulative frequency Table 2.10. 
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More than ‘ogive’ ın Fig. 3.12 can now be readily used to 
find the number or percentage of students scoring above a 
given score. For example, in estimating the number of students 
scoring above 45, we first mark 45 on the horizontal axis as A. 
Now, a vertical line through this point cuts the ogive at point 
B. This point B is then finally projected to vertical axis as 
point C. This point C, when measured on vertical axis, gives 
the required estimated number of students as 36. 


Problem Set 3 


l. State briefly the purpose served by the graphical 
representation of a frequency distribution. 
2. Define the following concepts: 


(i) Histogram 

(ii) Frequency polygon 
(iii) Frequency curve 
(iv) Ogive 


3. What do you understand by a cumulative frequeney 
distribution? Point out its special advantages and uses. 


4. Name the various ways of presenting a frequency 
distribution graphically. 
5. Prepare (a) Histogram (b) Frequency polygon for the 


following frequency distribution of grades in a final 
examination. 


Class Frequency Class Frequency 
10-19 6 50-59 • 12 
20-29 12 60-69 8 
30-39 20 70-79 6 


40-49 14 80-89 2 
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6. The following table gives the marks of 100 students in 
the subject ‘English’: 


Marks Students Marks Students 

0-9 5 40-49 15 
10-19 15 50-59 10 
20-29 18 60-69 5 
30-39 30 70-79 


Draw ‘less than’ and more than’ type ogives. Using those 
curves find the number of students: 


(i) with marks less than 45. 
(1) with marks more than 65. 
(ii) with marks between 45 and 65. 


7. Draw ogive for the data in problem 5. 

8. Draw histogram, frequency polygon and frequency 
curve for the frequency distribution in problem 6. Also 
mention the form of the frequency curve. 

9. What are different forms of a frequency curve? Describe 
their nature in brief. 

10. Prepare histogram, frequency polygon for the following 
frequency distribution: 


10—20 20—40 40—50 50—70 70—80 80—90 


Classes : 
25 18 11 6 


Frequency: 7 20 
11. Prepare histogram for the following cumulative 
frequency distribution 


Classes Frequeney 
Less than 10 5 
Less than 20 8 


Less than 30 15 
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Classes 


Less than 40 
Less than 50 
Less than 60 
Less than 70 
Less than 80 


Frequency 


29 
36 
45 
53 
60 


Hint: First convert the present table to a simple frequency 


table. 


Answers 


6. (i) 76 (ii) 


4 (iii) 20 


4 


MEASURES OF CENTRAL TENDENCY 


4. Introduction 

In chapter 2, the raw data consisting of scores of 50 students 
in a statistics examination were organised. into a frequency 
table (Table 2.8 and 2.9). The frequency distribution and its 
graphical presentation gave us information about the perform- 
ance of the students in the examination. 

Now suppose the students from two or more classes 


in the examination and ‘we want to compare the 


appeared 
sted 


performance of the classes in the examination or are intere. 
in comparing the performance of the same class after some 
coaching over a period of time. When making these com- 
parisons, it is not practical to use the full frequency distribu- 
tions, however compactly these may be presented. Therefore, 
for such a statistical analysis, we need a single representative 
value that describes the entire mass of data given in a frequency 
distribution. This single representative value is called central 
value, measure of location or an average around which individual 
values of a variable cluster. Since this central value or an 
average enables us to geta gist of the entire mass of data, its 
value lies somewhere in the middle between the two extremes 
ie. the minimum and the maximum values of the variable. 
For this reason such a central value or an average is frequently 
called a measure of central tendency. Clearly, the concept of a 
measure of central tendency is concerned only with quantitative 
variables and is not defined for qualitative variables which have 


no scale to be measured on. 
There are different measures of central tendencies, out of 


which th: following three are most commonly used: 
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4.1. The arithmetic average or meun. 
4.2. The median. 
4.3. The mode. 


4.1. The Arithmetic mean or mean 

The arithmetic mean is most popular and widely used 
measure of central tendency representing the entire mass of 
given data by a single value. The arithmetic mean is obtained 


Z х, (4.1) 


Here, ¥=the arithmetic mean (read as: ‘x — bar’) 
n=the number of given variates or observations. 


x;=the sum of given variates ог observations (read as: 


i= 
‘summation x’). However, in normal’ use simply 


n 
Хх is written in place of Х Xi, if there is no con- 
i=1 К 


fusion about the number of observations summed. 


4.1.1. Computation of arithmetic mean (ungrouped data) 
The arithmetic mean for ungrouped data can be calculated 
by using any of the following two methods: 


4.1.1. (a) Direct method. 
4.1.1. (b) Short-cut method. 


4.1.1. (a) Direct Méthod 


In case of ungrouped data we are always given some 
measurements on a variable x Say Xy. Хаи X 


гу Xn. Formula 
(4.1) is then used for calculating arithmetic mean in the direct 


method. The following examples will clarify the computa- 
tion procedure: 
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Example 4.1. The weights of 5 students are observed as 60, 
65, 62, 70 and 63 kgs. respectively. Find the average weight 
of the students. 


Solution 

Here we are given five measurements on a variable weight 
(x) which are 

x1— 60, x= 65, x4— 62, x= 70 and x,—63 

Therefore, the average weight, using formula 4.1, will be 


god DAD 
nis n 
..60-4-65-- 62 +70--63 320 —64 kgs 
E Tc m5 à 


Example 4.2. Find mean score forthe ungrouped data in 
Table 2.1. 


Solution : 
In Table 2.1, we are given 50 values of a variable score (x). 


Therefore, the arithmetic mean of scores: 


ч Xi Xo... Хе 
SEW 50 
504-61--....4-504-78 


MIENNE 
2791 55,82 marks. 


50 


4.1.1. (b) Short-cut method of calculating arithmetic 
mean for ungrouped data 

Sometimes by choosing an assumed mean and calculating 
deviations of the given variates or observations from it makes 
the calculation of mean simpler. This average is usually chosen 
to be a neat round number in the middle of the range of the 
observations, so that deviations can be easily obtained by sub- 
traction. Then, a formula, based on deviations from assumed 


mean, for calculating arithmetic mean becomes: 
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(4.2) 


n 
Here, Zd may be used in place of Xd; to simplify notations. 
i=] 


x—the arithmetic mean 

a=the assumed mean 

d;=(xi—a), the deviation of each value of the vari- 
able from the assumed mean. 


n 
Zd;— Xd = the sum of the deviations. 
i=] 
Example 4.3. Calculate arithmetic mean for the data in 
example 4.1 by using short-cut method. 


Solution 

In this example, the ungrouped data ranges from 60 to 
70 kgs. Therefore, 65. a neat round value in the middle of 60 
and 70, may be taken as assumed mean, i.e., a—65. Deviations 
and sum of deviations needed in formula (4.2) may be 
calculated in a table given below : 

TABLE 4.1 
Computation of mean (Short-cut method) 


S. No. | Weight Deviations from a-- 65 
x d=-(x—a) 
1 60 => 
2 65 о 
3 62 -—3 
4 70 5 
5 63 == 
! 
N=5 | 1 24--5 
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Using formula (4.2) the arithmetic mean will be 


me " 
n 


-64 69 —65—1—64 kgs. 


Remarks 
I. Comparing direct and short-cut methods of calculating 


arithmetic mean for ungrouped data, we see that the latter does 
not seem to be shortening the procedure. Though, short-cut 
method uses deviations to reduce calculations to a marked 
extent yet their computation needs extra tabulation which tend 
to make it a rather lengthy procedure. Short-cut method 
should be used where the given observations are too large or in 
fractions. However, if the observations are not too large or a 
machine calculator is available, the direct method is good 
enough for calculating arithmetic mean for ungrouped data. 

2. Choosing the assumed mean in the middle of the range 
of data gives negative ds which might be upsetting to deal 
algebraically. In such a situation, one can choose ‘a’ as a neat 
round number less than or equal to the minimum variate ог 
observation to get all d/'s to be positive. 


4.1.2. Computation of arithmetic mean for grouped 


data — discrete variable 
Computation of arithmetic mean for a discrete frequency 
distribution can also be done with the following two 


procedures— 


4.1.2. (a) Direct Method. 
4.1.2. (b) Short-cut Method. 


4.1.2. (a) Direct Method Ur ) 
The arithmetic mean for a discrete frequency distribution 18 


obtained by using the following formula: 


1 
EX (4.3) 
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Here, X f x is normally used as а simplified notation for 


n 

Z fi xi. Other symbols used are— 

i=] 
f,—the frequency of the ith variate or observation. 
Xi—the value of the ith observation. 


n 
Z fj, N— Total frequency. 
i=] 


The steps involved in the computation procedure may be 
listed as: 


(i) Multiply the frequency (f) with corresponding variate 
to get the column f x. Sum this column to get 2 f x. 
(ii) Calculate sum of the given frequencies as 2 f=N. 
(iii) Use formula (4.3) to get arithmetic mean. 


The procedure will be more clear from the following 
example, 


Example 4.4. The following table gives the distribution of 
marks of 50 students in a class in an examination. Maximum 
marks being 10. 


Marks (x) 1 2 3 4 5 6 7 8 
No. of Students (f) 3 26108412 8 6 4 2 


Calculate the arithmetic mean. 


Solution 
The calculations needed for arithmetic mean should be put 
in the tabular form as shown in Table 4.2. 


Using formula (4.3), the arithmetic mean is 


ж 2 ОП 
N 50 


—4.22 marks. 
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TABLE 4.2 


Computation of arithmetic mean 


S.No. Marks No. of Students fx 
x f 
T. = " gu пе Е 

1 1 3 1х3=3 

2 2 5 | 2х5=10 

3 3 10 3x10—30 

4 4 12 4x12—48 

5 5 8 5x8—40 

6 6 6 6x6=36 

7 4j 4 7х4=28 

8 8 2 8x2=16 
Zf=N=50 Dix=211 


4.1.2. (b) Short-cut Method А 
Though the short-cut method does not necessarily reduce 


the computational work in the case of ungrouped data, for 
grouped data, where the values of the variable and the corres- 
ponding frequencies are large, this method results in a consider- 
able saving of time. 


According to this method, the arithmetic mean is calculated 


by using: 
= fidi 
¥=at 28 2 
= та (4.4) 
=а+ N 
Here, a—the assumed mean 


dj=(xi—a), the deviation of each x, from 
assumed mean a. 
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n 
2fd is a simplified notation for X Ба. 
i=1 
Computation procedure is clarified in the following example. 


Example 4.5. Using short-cut method, calculate arithmetic 
mean for the data in Example 4.4. 


Solution 

In this case, assumed mean ‘a’ is taken as 4 because it is 
nearly at the centre of the range of the variable x. The other 
needed calculations are given in Table 4.3. 


TABLE 4.3 


Computation of arithmetic mean (Short-cut Method) 


S.No. | Marks | No. of Students Deviations | fd 
x f d—x—4 

1 1 ] 3 ==) | ي‎ 9 
2 2 5 = | 5х—2==10 
3 3 10 -1 10x—1=—10 
4 4 12 0 12x0— 0 
5 5 8 1 | 8х1= 8 
6 6 6 2 | 6x2— 12 
1 7 4 3 ! 4x3e 12 
8 8 2 4 2x4— 8 
—— 

Zf=N=50 Zfd=+11 


Using formula (4.4), the arithmetic mean 

z 5а 
х=а اج‎ 
T N 
11 
=4+ — 
T 50 

—4.22 marks 
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4.1.3. Some algebraic properties of arithmetic mean 
(i) The arithmetic mean is defined as 


or Zx—nx* (4.5) 


Therefore, if the number of observations (n) and their 
average (x) are both given, the sum of the observations (Ex) 
can be obtained by using formula (4.5.) 

(ii) The algebraic sum of the deviations of the variates 
from the arithmetic mean is equal to zero. Symbolically, 


n 
Z(xi— ¥)==0 
1-1 


ог 2(x—*)=0 (in simplified notation) 


(iii) If the two groups have n, and n, variate values or 
observations with х1 and X» as their respective arithmetic mean, 
the combined arithmetic mean (xiz) of the two groups is 


obtained by using the formula— 


их HNX (4.6) 


Хр 
ni-Fns 


The following examples will demonstrate the use ofthe 
above properties. 
Example 4.6. The arithmetic mean of age of 6 students is 


12 years. The individual ages of 5 of them are 11, 9, 12, 13 and 
13 years. Find the age of the sixth student. 


Solution 
Here, the mean age of 6 students is given to be 12 years, | 


i.e., п=6 and x—12 years. Therefore, the sum of ages of 6 


students, i.e., 


Ax-—nx 
—6x12272 years [using formula (4.5] 
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Also, the sum of ages of 5 students 
-11-9-4-124-13--13 
=57 years. 

Thus, the age of the sixth student 
=72—57=15 years. 


Example 4.7. 25 boys and 15 girls in a class appeared in an 
examination. The average grade for boys and girls were 5 and 
6 respectively. Find the average grade of all the 40 students in 
the class. 


Solution 
Here, we are given that 


п1=25 ^ х,=5 ! 
12-15 " X3—6 


Using formula (4.6), the combined average grade of all the 
40 students will be 


пр PIF nF» 
nin, 
_25x5+15x6_ 125+90_ 215 
||. 25+15 A 40 
—5.375 grade. 


Example 4.8. The pass results of 40 students who took up 
a class test are given below: 


Marks : 4 5) 6 Ч! 8 9 
No. of students : 6 8 T 4 3 2 


Ifthe average marks for 40 students were 5.20, determine 
average marks of the students who failed. 


Solution 
In the present example, the given frequency table is first 
used to find the sum of marks of the students who passed. 
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TABLE 4.4 


Computation of sum of marks for pass students 


No. of students fx 
f 
6 24 
8 40 
7 42 
4 28 
3 24 
2 18 
Total Df==30 Zfx—176 


Thus, the sum of marks of students who passed 
—Xfx—176 
Again, since, the average marks (x) for all the 40 (n) students 


is given to be 5.20. Therefore, the sum of marks of all the 40 
students will be: 


n¥=40 x 5.20—208 marks [using formula (4.5)] 


Therefore, the sum of marks of 10 students who failed 


--208--176 
=32 


Hence, the average marks of 10 students who failed 


= 32 =3.2 marks, 
0 


1 
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4.1.4. Computation of arithmetic mean for grouped 
data—continuous variable 

While grouping the presented data in the form of a 
frequency distribution we loose information about the indivi- 
dual values of a variable. For example, when we say that the 
frequency in the score class 5-9 is 6then we cannot say as to 
how many students have scored 5,6,7,8 or 9 marks. Further, we 
observe that classes as such cannot be used in calculations. We, 
therefore, make an assumption that the frequencies within each 
class are uniformly distributed over its range, i.e., the fre- 
quencies below and above mid-point of the class are equal. 
An equivalent alternative assumption is that all the frequencies 
in a given class are concentrated at the mid-point of the class. 
Thus in the above example, all 6 students are supposed to have 
scored 7. With this assumption, each class is then represented 
by its mid-point, denoted by x. Using these mid-points of the 
classes and the corresponding frequencies, arithmetic mean can 
be calculated for the given frequency distribution by using any 
one of the following methods — 


4.1.4. (a) Direct Method. 
4.1.4. (b) Short-cut Method. 
4.1.4. (c) Step-deviation Method. 


4.1.4. (a) Direct Method 

After representing each class by its mid-point x, we use 
formula (4.3) to calculate arithmetic mean as we did in the 
case of grouped data for a discrete variable, 16., 


fx 


җе сы 


N 


Example 4.9. Compute arithmetic mean of scores for the 
frequeney distribution given in Table 2.9. 


Solution 
The needed calculations are given in Table 4.5. 
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TABLE 4.5 


Computation of arithmetic mean 


" 5 No. of 
S.No. Scores Mid-point students fx 
classes x f 
1 20—29 24.5 4 | 98.00 


2 30—39 34.5 207.00 
3 40-49 44.5 356.00 
4 50--59 54.5 654.00 
5 60--69 64.5 580.50 
6 70—79 74.5 521.50 
7 80—89 84.5 338.00 


Total N=50 Хїх--2155.00 


pen Ca Е_==2= 


"255.00 — 55.10 marks. 


4.1.4. (b) Short-cut Method for calculating Arithmetlc 
Mean 
Formula (4.4) is used to calculate arithmetic mean by short- 
cut method. Here x stands for the mid-points of the classes. 
The following example will clarify computation: 


Example 4.10. Compute arithmetic mean for the frequency 
distribution in Table 2.9. Use short-cut method. 


Solution 
Here, assumed mean is taken as a=54.5 as it lies in the 
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middle of the giyen values of thé midpoint x. The required 
calculations are shown in Table 4,0, 


irr e ње 


S. Мо. | Scores Mid-Point | f | Deviation fd 
Classes x d= (x—54:5) 
1 20—29 24.5 4 —30 —120 
2 30—39 34.5 6 —20 —120 
3 40—49 44.5 8 —10 —80 
4 50—59 54.5 28. 0 0 
5 60—69 64.5 9 10 90 
6 70—79 74.5 7 20 140 
7 80—89 84.5 4 30 120 
Таг" = IS 
Total N=50 fd +30 
AE. 


Therefore, the arithmetic mean 
Bid 
N 
30 
54.5-- === 54.5.-0.6 
4.5-- 50 4.5-н 


х=а+ 


=55.10 marks. 


The value of x obtained is, 


of course, the same as that 
obtained by direct method in the р 


tevious example. 


mmon factor is 
ass-interval of classes. The computa- 


further simplified if the deviations 
from assumed mean are divided by a common factor, usually 


denoted by h. The divided deviations are then called as step- 
deviations denoted by u. The formula, called step-deviation 


tion of arithmetic mean is 
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formula, makes accaunt fer Wis division in ealeulating ағин- 
шене mean and is given by — 


=а+ = ћ 
1 N x (4.7) 


Here, u=the step-deviations=d/h. 
h=the common factor. 


Other symbols have their usual meanings. The following 
example will clarify the method. 


Example 4.11. Compute arithmetic mean for the frequency 
distribution in Table 2.9, using step-deviation method. 


Solution 
In the following Table we see that the deviations taken from 


assumed mean a=54.5 have a common factor 10 in them. 
Therefore, h will also be taken as 10 in this example. Other 
needed calculations for formula (4.7) are presented in Table 4.7. 


TaBLE 4.7 


Computation of arithmetic mean 
(Step-deviation method) 


S. No. Scores Mid-Point f Siep uetan 


Classes x 
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Now, using formula (4.7) 
Zfu 
= ХП 
eas N 


3 
= — x10 
54.5+ 50 


= 54.5 +0.60 
=55.10 marks. 


Remark 

Comparing direct, short-cut and step-deviation methods of 
calculating arithmetic mean for grouped data, we observe that 
short-cut and step-deviation methods reduce the computational 
work considerably. Therefore, any of the two methods should 
be used for calculating mean of grouped data. Step-deviation 
method is specially recommended for frequency distributions 
with uniform class-interval or for discrete frequency distribu- 
tions where the given values of variable are equidistant. 


42. The Median 


In any given array, i.e., when observations are arranged 
from lowest to highest or vice versa*every observation holds a 
certain rank, be that the first, second, tenth or forty-fifth. 
Obviously, a particular rank given to an observation has its 
meaning among all the ranks of observations in the array. One 
such point which divides the array into two equal parts, so that 
exactly one-half of the observations are below, and one-half are 
above that point, is called mediam. Since the median clearly 
denotes the position of an observation or variate in an array, it 
is also called a position average. 
frequency distribution, the median sta 


variable which divides the total frequ 
i.e., 


In the case of a grouped 
nds for that value of the 
ency into two equal parts, 
N/2 observations fall below and N/2 above the median. 
Therefore, the student who stands exactly in the middle of the 
range of scores, with a median score of 45, say, is such that 
half the students are below him and half are above him. 


4.2.1. Computation of Median 


Since median is a variate ranked inthe middle of an array, 
it can be located very easily with odd number of variates in the 
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array. But when the array contains an even number of variates 
or observations, there will be two middle values and then the 
arithmetic mean of these middle values shall be the median. 
Various methods of locating median for ungrouped and grouped 
data are given below: 


422. Median in ungrouped data 

In order to find median, we arrange the given ungrouped 
data in ascending or descending order of magnitude, ,i.e., the 
data are first put in the form of an array. Now, the middle 


NP S +1 ut nd 
term, i.e., the median 18 indicated as th value in this 


order or array, where n is the number of variates given (If n is 


odd, it is exactly ae th value, if it is even, it is the mean 


of ." th and 242 th values). 


Example 4.12. Below are given mental ages of 9 students 


in a class: 
T5 10, 6, 8, 13, 9, 10, П, 6 
Locate the median mental age. 

ui n ascending 


As a first step, We arrange the given data ina 
order of magnitude as given below: 


6, 6, 7 8, 9, 10, 10, 11, 13 


gae А conr 
Now, median is the xp E —5th value in this 


arrangement, therefore 
Median=9 


mid-point in the above arrangement as 


, 9 is also the тап; 
Clean are below and above this point. 


exactly four observations 
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Example 4.13. Suppose we are given mental ages of 
8 students : 


TIONG 988,7 Ца МӘ, 10 and 11 
Find the median mental age. 


Solution 


Here, the number of observations is 8, which is an even 
number. Thus, there will be two middle values in the array and 
their average will be the median. It is the arithmetic average of 
4th and Sth values in the following order: 


от. (9: 10, 10, 11, 13 
Therefore, 
Median 4th value-+5th value — 9+10 
| 2 2 
=9.5 F 


Clearly, the number of 


observations beiow and above the 
median would now be four. 


4.2.3. Median in grouped data— (Discrete variable) 

In case of a discrete ferquency distribution, the procedure of 
computing median ` is fundamentally the same as that for 
ungrouped data. Conforming to the definition of median, we 
locate the value of the variable, i.e, a variate, so that exactly 
one half of the frequencies, that is N/2, are below, and one 
half above it. Thus median can be readily located with the help 


of a ‘less than’ cumulative frequency table as explained in the 
following example. 


Example 4.14, 


/ Locate median score fora discr 
distribution given 


ete frequency 
in example 4.4, 


Solution 
A ‘less than’ cumula 


tive frequency table is prepared first 
for the location of medi 


ап as given in Table 4.8. 
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TABLE 4.8 


Computation of Median 


Marks No. of students ‘Less than’ 


x f cumulative frequencies 


Total N=50 


50. Thus, N/2=25. Now, 
frequency is obviously not 
ich is the cumulation 


Here, the total frequency is N= 
cumulating from the lower end, 25th 
among the first 3, nor it is among 18, wh 
of frequencies corresponding to variates 1, 2 and 3 as shown in 
Table 4.8 above. The fourth cumulation of 30, upto variate 4, 
overshoots the mark of 25th frequency and as such 4 is the 


median score in this case. 


4.2.4. Median in grouped data—(Continuous variable) 

In the case of a continuous frequency distribution, we first 
locate the median class by cumulating the frequencies until 
N/2th point is reached. Now the median value is located within 
this class using an interpolation formula written as follows— 


N 


Median=l-+ y — Xh (4.3) 


m 
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Here, 1,= the true lower limit of the median class. 
N/2= One half of the total frequency. 


C= the cumulated frequency of the preceding 
median class. 


fm= the frequency of the median class, 
h= the class-interval 


The following example will clarify the steps— 


Example 4.15. Find median for 


the frequency distribution 
given in Table 2.9, 


Solution 


it can be seen that N/2= 50/2—25 is more 
umulative frequency 18, but is less than the 
next cumulative frequency 30, hence the median class is ‘50-59° 
With its class interval 10 and true limits *49,5-59 5". 


TABLE 4.9 


Computation of Median 


Scores No. of “Less than’ cumulative 
S. No. classes students frequency 
f 
5s m t | 4 
1 20—29 4 4 | 
2 30—39 6 10 | 
3 40—49 8 18=C 
4 50—59 12— fm 30 + — Median class 
3 60— 69 9 39 
Е 10579 7 46 
1 80—89 4 50 
02, 
Total N=50 
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Ч Therefore; iS. et, ext, 14-12: Des 
N 25, 


2 


Putting these values in interpolation formula (4.8), we have 


= 


ко 
Median=1,+ T (h) 


m 


25185 
—49.54- 57 x10 


4.2.5. Graphical location of median 
For grouped data, median can also be located very easily 
with the help of the cumulative frequency curves or ‘ogives’. 


The procedure involves the following steps: 


1. Draw ‘less than’ or ‘more than’ ogive for the given 


frequency distribution as discussed in chapter 3! 
2. Marka point corresponding to the value N/2 along 
y-axis, where N is the total frequency. 
3. From this point, draw a line parallel to x-axis meeting 
the ogive at the point ‘A’ say. 
4. Project this point A on to x-axis by drawing а straight 
line parallel to y-axis through point A. 
5. This projected point on x-axis, say M, when read along 
with the scale, gives the value of the median. 


The median of the frequency distribution in Table 2.9 is 
located with the help of ‘less than’ ogive as shown: in Fig. 4.1. 
The median value so located is 55 approximately. 

The median can also be located from the point of inter- 


section of ‘less than’ and ‘more than’ ogives. In this procedure, 
we draw a perpendicular on x-axis through the intersecting 
point of the two curves. The abscissa (distance along x-axis) 
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Sor 


5 


~ 
о 


= Cumuletive Frequency ‘Less Than ——e- 
ES E 
o 
> 


o 
ما ص‎ © ~ со 
= Upper True Class Limits e 
Fig. 4.1, Locating Median in ‘Less than’ Ogive. 
of the point, Say M, on x-axis 
clarify the Steps in locati 
bution in Table 2.9. 


gives median. Fig. 42 will 
ng median of the frequency distri- 


43. The Mode 


Scores occur only onc - A distribution having one 
mode is called unimodal, у 4 
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40 
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—=— True Class Limits wer 


» 
о 


—— Cumulative Frequency 
m 
© S 
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Fig. 4.2. Locating Median using both ogives. 


Again, let the resulting scores be: 
42, 47, 58, 79, 72, 83, 96, 52, 57 and 65 


Then, there is no mode as every score has equal frequency. 
Considering a third series of scores, say: 


56, 62, 47, 92, 50, 65, 75, 50, 82 and 75 


we observe that there are two modes 50 and 75 as they 

both occur twice while others occur only once. In such a 
case the distribution is called bimodal. But still, in the 

case of a bimodal distribution, the two modes need not 
frequency, as will be clear from the 
given on p. 84. However, this is clear 
hat in some cases the mode may be 
be one or more modes. 


always have the same 

discussion of the Fig. 4.4 
from the above discussion t 
absent while in others there may 
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If the frequency curve of the given distribution is drawn 
then the mode is the value of the variable at which the 
curve reaches its maximum, i.e., it is that variate where the 


concentration of values is maximum. The following Fig. 4.3 
shows the modal value of a frequency distribution, 


f 


Frequency 


i 
| 
| 
| 
| 
| 
| 
| 
| 


не 
о Mode 


7 Variable —— = 


Fig. 4.3. Mode of the distribution 


In the case of a bimodal distribution, the concentration of 
frequencies occurs at two points, so that there are two modes. 
The shape of such a distribution will be às given in Fig. 4.4, 


! 


Frequency 


| 
| 
| 
| 
| 
| 
| 


E E a загин, 
Mode 1 Mode 2 


™-—Variable——»— 
Fig. 4.4, Bimodal distribution, 
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Here it can be remarked that the occurrence of more than 
one mode ina single distribution may be useful for further 
Statistical analysis, but the mode as a measure of central 
tendency has little significance in the case of a. bimodal ora 
tri-modal distribution. 


4.3.1. Determination of Mode (Ungrouped Data) 

For ungrouped data mode can be located simply Бу Inspec- 
tion.’ Here the data are first arranged in the form of an array 
and then we count the frequencies of each variate. The variate 
which has the maximum frequency is the mode. Thus, in fact, 
we always need grouping for locating a mode, since without 
grouping there would be no frequencies. The following example 
will clarify the points. 


Example 4.16. Find the. mode of the following array of 
ungrouped scores : 


7, 7, 8, 9, 9, 9, 10, П, 1i,' 12 


Solution f 

Since there are only 10 observations in the array, we need 
not put the data in a grouped form and mode can be deter- 
mined by inspection only. 

Here, variate 9 has the maximum frequency 3. Therefore 9 
is the mode. 


4.3.2. Mode in Grouped Data (Discrete variable) 

In the case of grouped data given in the form of a discrete 
frequency distribution, a modal value is located simply by 
inspection. For example, we group the data in example 4.16 in 
the form of a discrete frequency distribution as shown in 


Table 4.10. 
Here 9 is the variate having 3 as its maximum frequency. 


Therefore 9 is the mode of the given discrete frequency 
distribution. 


4.3.3. Mode in Grouped Data (Continuous variable) 
In a continuous frequency distribution, frequencies are 
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TABLE 4.10 


Computation of Mode 


Scores 
x f 
7 2 
8 1 
9=Mode 3 -Maximum Frequency 
10 1 
11 2 
12 1 


given in various classes. Now a class having maximum 
frequency is called the modal class. The precise value of mode 


is then determined within this modal class by using an inter- 
polation formula given below : 


= +_ fi—f, 
Mode tae h 


1—6. 


(4.9) 


Where, lı= the true lower limit of the modal class. 


ћ= the frequency of the modal class. 
f— the frequency of the class preceding 
modal class. 
{= the frequency of the class succeeding 
modal class. 
— class-interval of the modal class. 


While using formula (4.9) for 


calculating mode, it is 
necessary that the class- 


intervals are uniformly taken in the 
formation of frequency distribution ; otherwise one could, by 


making a particular class-interval large enough, obtain an 
arbitrary mode value—obviously a meaningless result. For 
finding mode of distributions with unequal class-intervals, the 
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classes should be regrouped in uniform class-intervals before 
using the formula (4.9). Computation of mode for a continuous 
frequency distribution is clarified in the following example: 


Example 4.17. Find mode of the frequency ditribution 
given in Table 2.9. 


Solution 
TABLE 4.11 
Computation of Mode 
Scores 
classes f 
20—29 4 
30—39, 6 
40—49 8-1, 
50—59 12={ (Modal class) 
60—69 9—f2 
70—79 7 4 
80-—89 4 


Here 50—59 is the modal class having maximum frequency 
12. The true limits of the class are 49.5—59.5. Also, the 
frequency of the modal class fı —12 and that of preceding and 
succeeding classes are f,—8 and f;—9 respectively. Uniform 
class-interval h— 10. Now, the precise value of mode is obtained 
by putting these values in formula (4.9): 
Б-Г, 

зеца ЭРЭ, 7-1) 2 
Моде=1, + FE xh 

12—8 
жїз 7 


—49.54- 410 


— 49.5+ 5.71 =55.21 


10 
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4.4. Empirical relationship between Mean, 
and Mode 
Some empirical relationships exist between various measures 
of central tendencies, i.e., mean, median and mode. In the 
case of unimodal distributions, these may be stated as under 
(i) In the case of a perfectly symmetrical distribution, mean, 
median and mode are equal. The relationship is also shown 
in Figure 4.5. 


Mean 
Median 
Mode 


Fig. 4.5. Mean =Median = Mode (Symmetrical distribution) 


(ii) For moderately asymmetrical distributions, the-locations 
of mean, median and mode are shown in Fig. 4.6. In the case 
of positively skewed curve, the mean shall have the highest 
value, the mode the lowest and the median will be about one- 
third the distance from the mean towards the mode. On the 


Mean 


Mode 

Median 

Mean 
Median 


Mode 


Fig. 4.6. Locations of а Р vely and 
Mean, Medi i iti 
dea : п and Mode in ositively 
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other hand, for a negatively skewed curve, the mean will be 
the lowest, the mode the largest and median will still be 
approximately at a one-third distance from mean towards 
the mode. 

р These empirical relationships for moderately asymmetrical 
distribution may be put in the form of a formula which reads : 


Mode = 3 Median—2 Mean (4.10) 


If any two values out of the three are known for a 
moderately asymmetrical curve, the third may be calculated by 
using the relationship in (4.10). 


Remark 

By an empirical relationship we mean that such a relation- 
Ship has been observed to exist in most of the distributions 
though there is no mathematical proof for it. 


4.5. Characteristics for an ‘ideal measure of Central 


value 
Che following are the characteristics of an ideal average : 


1. It should be rigidly defined and casily understandable. 

2. Its calculation should be based on all the variate 
_ values or observations. 

3. Itshould be capable for further algebraic treatment. 

4. ]tshould not be affected much by the extreme values 


of the variable. 


5. Itshould be least affected by fluctuations of sampling.! 


4.6 Relative Merits and Demerits of Various Averages 

The merits and demerits of various measures of central 
value can be judged in the light of the characteristics listed 
above. We consider them one by one. 


l. The field of inferential or inductive statistics is basically concernca 
with generalizations and predictions. Here we can draw a number of 
samples from the population with some specific sampling procedure. 
The valucs. of some particular statistic (mean, median ctc) calculated 
with these samples differ among themselves. Fluctuation of sam plingis 


the cause of difference among the values of the statistic. 


90. 
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4.6.1. Merits and Demerits of Arithmetic Mean 


ї, 
2. 
3 


4.6.2. 
ЈА 


2 


It is rigidly defined and easily understandable. 

Its computation is based on all the observations. 

It is also capable of being handled algebraically as its 
computation is based on all the observations made. 

Its value is least affected by fluctuations of sampling. 
The only demerit of an arithmetic mean is that it is 
highly affected by extreme values of a variable. For 
example, if the income of a businessman is Rs. 2000 
per month and the incomes of his 4 employees are 
Rs. 150, 120, 130, and 140 respectively. Then the 
arithmetic mean of income-—- 


y_ 2000+ 150+120+130+140 _ 2540 
1 5 5 


— Rs. 508 


is not a representative, value for the given observations. 
Actually it gives equal weight to bigger and smaller 
observations, which, distorts the average. 


. Merits and Demerits of Median 


Median is also rigidly defined and easy to understand. 
Its computation is not based on all the variates. For 
“example, the median of an array of observations 
10, 25, 52; 62, 70 
is 52, Thus median value remains unaltered even if the 
observations 10 and 25 are replaced by any two 
observations less than 52 and the observations 62 and 
70 by any two values greater than 52. 
Like mean,median is not capable of further algebraic 
treatment because its computation is not based on all 
the observations. For example, if median of an array 
of observations is known, say 30, then one can hardly 
guess about the sum of observations in the array. à 
The median is the only average while dealing with a 
situation where numerical measurements are not given, 
but it is possible to rank the objects in some order. 
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5. It is not at all affected by extreme variates. This fact 
has already been discussed in point two above. 


6. As compared to mean, it is affected more by 
fluctuations of sampling. 


4.6.3. Merits and Demerits of Mode 
]. It is not well defined though easily understood. 


2. Its computation is not based on all the variates. 


3. It is also not capable of being handled algebraically 
because its value is not based on all the observations. 

4. Itis not affected by extreme variates provided they 
do not belong to modal class. 

5. Ascompared to mean, it is very much affected by 
fluctuations of sampling. 


4.7. Limitations of the Averages 

As defined earlier an average is a single representative value 
which possesses not only the convenience of compactness, but 
also the inconvenience of brevity. Due to this fact, there is no 
all purpose average which can be universally used. Moreover, it 
is not always easy to decide which average is most suitable for 
a particular statistical analysis. The characteristics of different 
averages are listed in terms of merits and demerits. But these 
terms too have no fixed meaning. For example, arithmetic 
mean is suitable as its computation is based on all the variates. 
But we cannot use it even if a single variate or observation is 
missing in some investigation. Median or mode may then be 
used for calculating the central value of the given variates. 
Thus while using averages, their inherent limitations should 
always be kept in mind and it is not always easy to decide 
about a suitable average in a given field of investigation. We 
shall consider few examples to illustrate the selection of 


an average. 
Itis generally believed that arithmetic mean is the best 


average for all general purposes and should always be preferred 
unless there is a definite reason for considering another 
average. Arithmetic mean is preferred to other central 
value in most cases because it is better suited to further 
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arithmetical computations. Secondly, the mean ordinarily 
fluctuate less widely from sample to sample taken from the 
same population. For descriptive purposes, the use of mean 
or arithmetic mean is recommended when the distribution of 
observations is symmetrical. Arithmetic mean is generally 
suitable in reporting the average height, average score and 
average income of a homogeneous class. 

The use of median is recommended when exceptionally 
large or small values occur in the given observations. Secondly, 
if we want to study the average of such phenomena which are 
qualitative, e.g., intelligence, honesty etc., median is the most 
suitable average. It is also used when one is reporting the 
average performance of students on several tests in different 
states. 

Mode is a suitable average when a quick, approximate and 
most typical measure of central value is desired. For example, 
in business and commerce, the terms like modal output per 
machine, average size of ready-made garments and average 
expenditure of a student in the hostel refers to mode. 


Problem Set 4 


1. Why averages are called as measures of central 
‘tendencies ? 3 

What аге the various measures of central tendency 7 

What is the object of a central value ? 

What are the requisites of a good average ? 


Define median. Discuss its merits and demerits. 


с س ج ي‎ м 


. What are the merits and demerits of arithmetic mean ? 
Why is it the most commonly used measure of a 
central value ? 


7. Define mode of a frequency distribution. Discuss its 
merits and demerits. 


8. State the empirical relationship between mean, median 
and mode. 
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9; 


,median and mode for the: 
The following Table shows the 


Marks : 
No. of Students: 10 30 


Find mean, median and mode for the following 
ungrouped scores of 15 students in a test : 


23, 29, 17, 36, 25, 23, 25, 40, 24, 22, 20, 
23, 24, 21, and 16. 
Now suppose a group of 5 students, with scores 30, 23, 


35, 27 and 39, is added to the original group. Deter- 
mine mean, median and mode of the combined scores 


of 20 students. 
The grades of a student on eight examinations were 78, 
80, 68, 72, 91, 84, 69, and 81. Find the median grade. 
Find mean, median and mode for the following set of 
marks : 

(854,75. 2 6: 19:15. 18; 9 and 7. 

(b) 50.6, 48.7, 52.9, 46.7, 51.3 and 53.2. 


50 students took an examination. The frequency ^ 


distribution of marks is as follows : 


Marks : 8 4 5 7, 
No. of Students : 6 ЈА 27915 6 


> оо 


Calculate mean using (1) direct method (ii) short-cut 


method. 

Find median and mode 
given in problem 12. 
30 students appeared in а class test. Out of these, 2 
students scored 3, 6 scored 4, 9 scored 5, 7 scored 6,4 
scored 7 and the rest scored 8 marks. Find mean. 
sc observed scores. 
distribution of marks of 
examination in а college 


for the frequency distribution 


120 students on a final 
competition. 

20-29 30-39 40-49 50-59 60-69 70-79 80-89 
44 21 10 4 1 


Determine mean, median and mode of the distribution. 
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21! 


9. 


10. 


Elementary Statistics in Psychology and Education 


Find mean, median and mode from the following 
cumulative frequency Table : 


Marks below : 80 70 60 50 40 30 20 10 
No.ofstudents : 50 45 40 30 16 10 7 3 


Hint : First convert the given cumulative frequency 
Table into a simple frequency Table : 


Two teachers of economics reported mean examination 
marks of 54 and 50 in their classes which consisted of 
25 and 35 students respectively. Find the mean marks 
for both the classes together. 


Use the empirical formula to compute the mode for a 
moderately skewed distribution whose mean and median 
are 7.5 and 6.0 units respectively. 


The mean grade of a student in an examination consis- 
ting of 6 subjects is 58. His grades in 5 subjects are 
52, 68, 60, 57 and 70. Determine his grade in the sixth 
subject. И 


Locate, graphically, the median of the frequency 
distribution in problem 15. 


Which average will be suitable to compare : 


(i) heights of students in two classes. 

(ii) average sales for various years, 
(iii) intelligence of students. 
(iv): marks of students in two classes, 

(v) the average size of ready-made garments. 


Answers 
Mean- 24.53, Median— 23.00 and Mode —23.00 


Combined mean=26.10, Median=24 and mode 23 
Median grade— 79 
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11. (a) Меап--5.56, median=5 and Mode=S and 9 
(b) Mean- 50.57, тейіап= 50.95 and No mode 

12. Меап= 5.40 

13. Median=5 Mode=6 

14. Mean=5.37, Median==5 and Mode=5 

15. Mean-45.08, Median=44.05 and Mode =43.28 

16. Mean--44.8, Median=46.42 and Mode- 46.67 

17. Combined mean xi»— 51.67 

18: 230 

19. 41 

21. (i) Mean (ii) Median (iii) Median (iv) Mean (v) Mode 


* ^ 
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MEASURES OF DISPERSION 


5. Introduction 

A measure of central tendency is a single representative 
value for a set of observations or a frequency distribution. But 
the central value or an average alone cannot describe the distri- 
bution adequately. For example, consider the scores of two 
groups of students on the same test : 


Scores of Ist group : 45, 57, 24, 41, 68, 84, 55, 62, 90 and 74 
Scores of 2nd Broup : 57, 60, 63, 68, 70, 52, 50 


Here both groups have the same mean score of 60. Buta 
close examination reveals that the two sets of scores differ 
widely from one another. The scores in the first group range 
from 24 to 90 while that in the second range from 50 to 70. This 
difference in range shows that the students in the second group 
are more homogeneous in scoring than those in the first. They 
seem to have nearly the same ability because their scores are less 
scattered from mean score of 60 as compared to the scores of 
students in the first group. Thus, the scores in a class consisting 
of individuals of nearly the same ability fall around the same 
point on the scale and the variability, dispersion or scatteredness 
of scores is relatively less. But if the class consist of individuals 
of widely differing abilities, the scores will range from very low 
to high, accordingly the variability, dispersion or scatteredness 
of scores will be relatively large. Similarly, a student whose 
scores in different subjects are less scattered around his average 
score will be treated more consistent and homogeneous in 
scoring than those having greater scatteredness in their scores. 
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Such a student seems to have nearly the same ability in all the 
subjects. Thus, while studying a distribution it is equally 
important to know whether the observations are clustered 
around or scattered away from the point of central tendency. 
Measures of dispersion help us in studying the extent to which 
observations are scattered about the average or the central value. 

The following are the three important measures of dispersion: 


5.1. The Range 
52. The Mean Deviation (M.D.) or Average Deviation (A.D.) 


53. The Standard Deviation (S.D.) and variance. 


5.1. The Range 
The range of a set of observations is defined as the difference 


between the largest and the smallest value. For grouped data, 
the range is the difference between the upper true limit of the 


highest class and the lower true limit of the lowest class. In 


this way, the range is a measure of variability or scatteredness 
of the variates or observations among themselves and does not 
give an idea about the spread ofthe observations around some 


central value. Symbolically ; the range R is given by— 


(5.1) 


R=X,—X1 


where x,,=The largest of the observed values. 
хі = The smallest of the observed values. 


For grouped data x, and Xi will be taken as upper true 
limit of the highest class and lower true limit of the lowest class 


respectively. 


5.1.1. Computation of Range (Ungrouped data) 


Example 5.1. Calculate range of the following set of scores. 


Scores : 28, 21, 4l, 30, 33, 37, 19 and 27. 


Solution 
Here, the largest observed score Хл=41 


the smallest observed score xi —19 
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Therefore, Range R—x,— x; 


5.1.2. Computation of Range (Grouped data) 
Example 5.2. Find the range of data in Table 2.9. 


Solution 
In this case, the upper true limit of the highest class 80-89 is 
X,—89.5 and the lower true limit of the lowest class 20-29 is 
Х1= 19,5. 
Therefore, Range R=x,—x 
=89.5—19,5=70.0. 


5.2. The Mean Deviation or Average Deviation 

The average of the absolute deviations of every variate value 
from some central value, such as mean ог median, is called the 
mean deviation or average deviation. Since some of the devi- 
ations about .mean are Positive and others negative, their 
algebraic sum about mean would be zero. That is why, in the 
definition of mean deviation, the average of the absolute (sign 
ignored) deviations of observations from some central value is 
taken. In general, the central value can be any measure of 
central tendency, but usually the term mean deviation indicates 
average of absolute deviations taken from mean. Hence, unless 
otherwise specified, mean deviation will be used to denote only 
the mean deviation from mean. 

Symbolically, we can express mean deviation about mean 
and about median, as follows : 


(i) The mean deviation from mean x : 


MD =L Ох з (5.2) 


(8) The mean deviation from median Ma: 


MDw- 1.2 | x—My | (5.3) 
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Here n=the number of variates. 
J | x—x | =Sum of absolute deviations taken from mean. 
Z | x—M, | =Sum of absolute deviations taken from median. 


For grouped data, the above formulae can be written as : 


1 
Мр „= N Of |x—x | (5.4) 
Мрм/= yr Sf | Х—МА | (5.5) 


Computation of mean deviation will be clear from the 
following examples. 


5.2.1. Computation of Mean Deviation (Ungrouped data) 


Example 5.3. Find mean deviation for the following set of 
variates. 


x=55, 45; 63, 76, 67, 84 


Solution 

In order to find mean deviation we first calculate mean for 
the given set of observations. The deviations and the absolute 
deviations are given in Table 5.1. 


TABLE 5.1 
Computation of Mean Deviation 


Deviation from Absolute deviations 
S. No. x mean (Sign ignored) 
de(x— X) Ix-5 | 
"n — ВИ _ DUE 
1 55 —10 10 
2 45 —20 20 
3 63 —2 2 
4 76 TH 11 
5 67 +2 2 
6 84 +19 19 
n=6 ®х=390 ХУ |х-#[ 6 
(uM T RE. ee тее. 
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Here, Mean ¥= Ex _ 390 =65 
n 6 
Using formula (5.2), the mean deviation 
1 = 64 
MD. Е|х-х| Sg OS 


Example 5.4. Compute mean deviation from median for 
the following set of scores : 


Scores=60, 68, 66, 74 and 62. 


Solution 


In this example, we first arrange the given set of scores in 
ascending order of magnitude as : 


60, 62, 66, 68 and 74 


Now median is the value of (n+1)/2=(5-+1)/2=3rd item 
in the above array. Therefore, тейіап= Ми=66. The other 
calculations needed for calculating mean deviation from median 
are given below in Table 5.2. 


TABLE 5.2 


Computation of mean deviation from median 


Scores Deviations from Absolute deviations 
S.No. x median (Sign ignored) 
(x--Ma) | x— Ma! 
1 60 —6 6 
2 68 +2 2 
3 66 0 0 
4 74 +8 8 
5 62 САА 4 
n=5 У |x—Ma| =20 
! 
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Therefore, using formula (5.3), the mean deviation from 


median : 


MDy,——- 2 | к–Ма| = =4. 


5.22. Mean Deviation for Grouped Data 


Example 5.5. Find mean deviation for the frequency 
distribution in Table 2.9. 


Solution 

As usual mean for the given distribution is first calculated to 
get the mean deviation. The required calculations are given in 
Table 5.3. 


Here, Mean Х-- "m = 275 = 55.10 


Therefore, using formula (5.4), the mean deviation is 


obtained as: 


21157 2| 6760 
мру 2115-51 = 
—13.52. 


Example 5.6. Obtain mean deviation from median ofthe 
data in Table 2.9. 
Solution 

In this example, we first calculate median of the frequency 
distribution. The other calculations are also given in Table 5.4. 

Median is the value of N/2=50/2=25th item and, therefore, 
lics in 50-59 class whose true limits are 49.5-59.5. Now, using 
the interpolation formula (4.8), the median 


N 
nS 
Mbt p (2—1) 
25—18 
= ХО 
49.5+ i2 x 


255.33. 
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Therefore, using formula (5.5), the mean deviation about 
median is obtained as : 


Мрм,= 1 zf | x—Ma | = сыг 


13.57. 


5.3. The Standard Deviation or S.D. and Variance 

The standard deviation is defined as the positive square root 
of the arithmetic mean of the squares of deviations of the 
observations from the arithmetic mean. It may be also called 
as ‘Root mean square deviation from mean’ and is gencrally 
denoted by the small Greek letter e (Sigma). Symbolically, the 
standard deviation for ungrouped data is defined as: 


{3 ed x) 
S.D. oc n 
or = / Dx —8y (in simplified notations) 
n 


(5.6) 


In the case of a frequency distribution, the standard devia- 
tion (S.D.) is defined as: 


[3 fi(xi— x)? 


c= = 


Sf; 
i=1 


or in simplified notations, 


fag DIGE А 
= J LN (5.7) 


The square of the standard deviation is termed as variance. 
Therefore, the formula for variance for ungrouped data be- 
comes— { 

Variance—o?— 


(5.8) 


(хх)! 
п 
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Similarly, for a frequency distribution, the variance is given 
by, 


Xf(x—x)? 
f(x— x) (5.9) 


It may bc noted that in calculating MD, we disregard signs 
of deviations and consider their absolute values only, whereas 
in finding S.D. we square the deviations. Morcover, the devia- 
ng S.D. are always taken from arithmetic 


tions used in calculati 
alue is used for the purpose. 


mean only and no other central v 


5.331. Computation of S.D. (Ungrouped Data) 
There are two ways of computing S.D. for ungrouped data; 


5.3.1. (a) Direct method. 
5.3.1. (b) .Short-cut method. 


5.31. (a) Direct method 
This method uses formula (5.6) for finding S.D. which 


involves the following steps: 


(i) Calculate arithmetic mean X of the given data. 
(1) Obtain deviation of each variate from x as (x—X). 
(iii) Square each deviation to get (x — Y. 
(iv) Obtain the sum of squared deviations as X (х—х)?. 
(v) Using formula (5.8), calculate variance o? as the 
arithmetic mean of the squared deviations, i.e., 
Ta S (Х-=х)? 
А п 
(vi) The positive square root of variance is then calculated 
to get S.D. as: 


E |же 
п 


le will show how these steps are 


с 


The following examp 
followed: 
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Example 5.7 From the following list of test-scores, 
50, 48, 54, 66, 63, 60, 55 and 68. 
Find variance and 5.0. of scores. 


Solution 
Calculations needed for calculating variance and S.D. are 
given in Table 5.5. 


TABLE 5.5 


Computation of variance and 8.0. (Ungrouped Data) 


S. No. Scores Deviations from Squares of 
x mean deviations 
(x -X) (х—х)? 
1 50 - 8 64 
2 48 --10 100 
3 54 —4 16 
4 66 +8 64 
5 63 F3 25 
6 60 + 2 4 
7 55 ка 9 
8 68 +10 100 
п=8 Ух= 464 2(х—%)2 = 382 
Here, Меап= 24 = si 
=58 


Using formula (5.8), the variance 


= Z&—7) _ 382 
n 8: 
=5 


в? 
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Now, using formula (5.6), the S.D. will be: 


5.3.1. (b) Short-cut Method 

In most of the cases the arithmetic mean of the given data 
happens to be a fractional value and then the process of taking 
deviations and squaring them becomes tedious and time con- 
suming in the computation of S.D. To facilitate computation 
in such situations, the deviations may be taken from an 
assumed mean. The adjusted short-cut formula for calculating 


S.D. will then be, 


| 2d? RE 2 
&.р;=в= j (22) (5.10) 


eviation of the variate from ап assumed mean, 
say А.; i.e., d= (x —A). 

d? = The square of the deviation. 
Xd = The sum of the deviations. 
Ed? = The sum of the squared deviations. 

n = The number of variates. 


Where, d — 


The computation procedure is clarified in the following 
example; 
Example 5.8. Find S.D. of the data in example 5.7. Use 


short-cut method. 


Solution ЯЛ 
Let us take assumed mean A=60. The deviations and 
squares of deviations needed in formula (5.10) are given 10 


Table 5.6. 
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TABLE 5.6 


Computation of S.D. (Short-cut method) 


Scores Deviations from Squarc of deviations 
S. No. x assumcd mean d? 
d=(x—60) 
1 100 
2 48 12, 144 
3 54 — 6 36 
4 66 + 6 36 
5 63 +3 9 
6 60 0 0 
7 55 —5 25 
8 68 +8 64 
n=8 54--16 ЇГ Zd?-414 


Putting the values from table in formula (5.10), the S.D 


| Zd __ (23 


= 

414 (-16 үг. 
EAE цан ) 
= 51.75—4.00 
—6.91. 


which is the same result as we obtained by using deviations 
from arithmetic mean in the previous cxample. But short-cut 
method tends to reduce the calculation work in situations 
where arithmetic mean is not a whole number. 


5.3.2. Computation of S.D. in grouped data 


For calculating S.D. for grouped data any of the following 
methods may be applied. 
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5.3.2 (a). Direct method 
5.3.2 (b). Short-cut method 
5.3.2 (c). Step-deviation method. 


5.3.2. (a) Direct Method 

This method uses actual arithmetic mean in considering 
deviations of given observations, i.e., we find (x—x ). These 
deviations are then squared and multiplied by the correspond- 
ing frequencies to get Sf (x-F)?. The formula (5.7) is then used 
to calculate S.D. 4 

However, in practice, the method is rarely used as the 
arithmetic mean is generally a fractional value. 


Example 5.9. Find S.D. of the frequency distribution in 
Table 2.9. 


Solution 
We have already calculated x=55.10 for this data in example 


5.5. Other required calculations are given in Table 5.7. 
Therefore, by using formula (5.7), we get S.D. 


ТЕ Кос; 
N 
0 


y 
__ f 14082.0 
e 5 
=16.78. 


\ 


5.3.2. (b) Short-cut Method ) : 
In order to avoid the difficulties in computation faced in 


direct method we consider the deviations of observations from 
a suitably chosen assumed mean, say A. The following formula 
is then used for calculating the S.D. 


{ьт ЖЕУ РЕЛ 
“яаг Dd ү? 5.11 
а= Жүз UN ) (5.11) 


where d is deviation from assumed mean. 
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The following steps are then involved in the computation 
of standard deviation: 


(i) Obtain deviations of the variates from assumed mean 
A as d = (x-A). 

(ii) Multiply these deviations by corresponding frequencies 
to get the column fd. The sum of this column gives 
Zfd. 

(iii) Square the deviations to get d?. 

(iv) Multiply these squared deviations also by the corres- 
ponding frequencies to get the column fd?. The sum of 
this column will be Zfd? 

(v) Use formula (5.11) to find S.D. 


Example 5.10. Using short-cut method find S.D. of the data 
in Table 2.9. 


Solution 
Let us take assumed mean А =54.5. Other calculations 


needed for calculating S.D. are given in Table 5.8. 


Now, by using formula (5.11), the S.D. 


31 J Е = Оа d 
с = Е Aes Е 
N SUN 


Putting values from table 


Ewa 

Паси “9 

4 50 50 
Жет? 

—4./382—0.36 

—16.78 


5.32. (c) Step-Deviation Method НУУ 
In every short-cut method of calculation our main aim is to 


simplify the deviations so that computations become „easier. In 
the fourth column of Table 5.8. we observe that the calculations 
can be further simplified if deviations (d) are divided by a 
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common factor 10 (in general, it is class-interval h) to get the 
step deviations as d'. However, this division is then reflected 
in the formula for calculating S.D. which now reads; 


већ / = e y (5.12) 


Steps involved in calculating S.D. by using this method 
will be clear from the following example. 


Example 5.11. Using step-deviation method find S.D. of the 
data in Table 2.9. | 


Solution 

Here deviations 
These deviations ar 
get the step-deviation (d’) in column fou 


are first taken from assumed mean А = 54.5. 
е then divided by a common factor h=10 to 
r of Table 5.9. 


Now, by using formula (5.12), the S.D. 


4 NE 
NE 20-28) 


N N 
Putting values from table, 


141 р 
TN 5) 
—10/2.82—.0036- 


—10x1.678 
=16.78. 


Now, having demonstrated the different methods of compu- 
tation, it is easy to see that step-deviation method is, in general, 


best for calculating S.D. of the grouped data. 


5.43. Properties of S.D. 


5.3.3 (a). Effect upon S.D. when a constant is added to 


each variate 
We discuss this € 
tion, The Table 5. 


есі upon S.D. by considering an illustra- 
10 shows original scores of 5 students in a 
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test with an arithmetic mean score of 6. New scores (x) are 
also given in the same table which we obtain by adding a 
constant 3 to each original score. Using formula (5.6), we 


Observe that S.D. of the scores remains the same in both the 
situations. 


Here gx. 30 226 
n 
eek NUTS 
CU ht. as E? 


ES З =4/2 =1.41 


That is, с:=6' 


Thus, the value of S.D. of a frequency distribution remains 
unchanged if each variate value is increased by the same constant 


valug. The same is true when а constant value is subtracted from 
each yariate. у 


5.3.3 (Б). Effect upon S.D. when each 
plied by the same constant 


To observe the effect u 
by the same constant, the 
by multiplying each origi 


variate is multi- 


pon S.D. of multiplying variate values 


new scores in Table 5.11 are obtained 
nal score by 5, 
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TABLE 5.11 


Effect upon 8.0. when each variate value is multiplied 
by a constant 


S. No. Oe узо (х'—®') т та 
х 4 (x — x’)? 
и 
і 8 40 +10 100 
2 7 35 45 25 
3 6 30 0 0 
4 xS 25 S Vl 25 
5 4 20 —10 100 
E — Jac ИН — кото шщ - 
n=5 Ex-30 Ух =150 | Ж(х'—#')?#250 
этэ ТҮ ee 
Неге x= 2х _ 150 230 
n 5 


Now, using formula (5.6), the S.D. of new scores, say 


“= PCa ji 250 
n 5 
2450) 5421 
=5x1.41 
=7.05 
That is, 5'=5 с. 


Thus, for а given set of observations or a frequency distri- 
bution, if each observed value is multiplied or divided by a 


constant value, S.D. of the new observations will also be multi- 


plied or divided by the same constant. 


ined Standard Deviation 
distributions have means X, and x; and 
d o, respectively. Then, the combined. 


5.33 (c). Comb: 
If two frequency 
standard deviations c; ап 
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S.D., denoted by ci» of the two distributions is obtained by 
using : 


=] Ка D; )- Ме Di) (5413) 
Ni+N, 
Where, N,=total frequency or the number of obser- 
vations in first frequency distribution, 
N,-total frequency ог the number of obser- 
vations in second frequency distribution, 
Di=(x1,—x))=the difference between combined mean 
and the mean of the first frequency 
distribution, 
D2=(Xi.—X2)=the difference between combined mean 


and the mean of the second frequency 
distribution. 


The formula (5.13 
distributions. 
it will be 


) can be extended to any number of 
For example, in the case of three distributions, 


Ni+N.+N, 
Here, Di—(5,,4—x1) 
D;-—(x123—x2) 
D3=(xi23 —53) 
Xi the combined mean of the threc distribu- 
tions. Other symbols have 
meanings. 


Ме ED DLN (aS Lg 5 
EDIT Мег + Do?) + М (аз2 0,2) (5.14) 


their usual 


Example 5.12. On a test students’ scores had mean 52 
and S.D. 15. For improving the results, 


a score of five was 
added to the score of each student. Find the S.D. of improved 
scores, 


Solution 


In the first Property of 8 


| -D. we observed that it remains 
unchanged if each Score is incre. 


ased or decreased by a constant 
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value. Here a constant score of 5 is added to each score and, 
therefore, new S.D. will remain the same as 15. 


Example 5.13. On ап arithmetic examination a class 
aluated a class of 50 students on a paper with maxi- 
mum marks 50. The mean score was 24 and S.D.—10. Later, 
the maximum score for the paper was raised to 100 and the 
score of each student too was doubled. Find the S.D. of these 


new scores. 


teacher ev: 


Solution 
Here, sinc 
value), the S.D. for these new sco 
The S.D. for new scores—2 x 10 
—20. 


e each score is being multiplied by two (a constant 
res too will be doubled, i.e. 


5.14. On an achievement test for two classes A 


Example 
and B, the mean and 5.0. of scores are as follows: 
Mean S.D. No. of students 
Class A 75 20 40 
Class B 65 25 60 


n score of the two classes. 


Find (i) the combined mea 
he two classes. 


(ii) the combined S.D. of the scores of t 


Solution 
Symbolically, the given data can be put as : 


Ч : o1=20 
کر‎ 65 с;=25 


Using formula (4.6), the combined mean score will be 


Ерак д 
_ 40х15--60х65 
+ 40--60 


=69 


120 Elementary Statistics in Psychology and Education 


Therefore, D)=(x12—x,)=69 —75= —6 
D2=(xX,2—x.)=69 —65 = 4-4 


Now, by using formula (5.13), the combined S.D. of scores 
of the two classes is obtained as — 


= [NEDEN D2) | 
j №--№ 
30(400--36) -60(625-1 16) 

100 | 
=\/ 559 =23.64. 


с 


5.4. Measurements of Relative Dispersion (Coefficient 
of variation) 

It is now clear that measures of dispersion give us an idea 
about the extent to which variates are scattered around their 
central value. Therefore, two frequency distributions having 
the same central values can be compared directly with the help 
of various measures of dispersion. If, for example, on a test 
in a class boys have a mean score X1=65 with S.D. а1=15 and 
girls mean score is ¥:=65 with S.D. 937-10. Clearly, girls, who 
have a lesser S.D., are more consistent in Scoring around their 
average score than boys. 

On the other hand we have situations when two or more 
distributions having unequal means or different units of 
measurements are to be compared in respect of their scattered- 
ness or variability. For making such comparisons we use 


coefficients of relative dispersion or coefficient of variation (СИ) 
given by 


C.V.=~% x 100 (5.15) 
x 


Symbols с and x have their usual meanings. 

The coefficient of Variation is useful for comparing the 
variability, homogeneity, uniformity and consistency of two or 
more distributions, The distribution havinga greater C.V. is 
more variable than the other, and the distribution with a lesser 
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C.V. shows more consistency, uniformity and homogeneity. 
Тїз following examples will show the use of С.У. 


Example 5.15. From the data given in example 5.14. 
Compare: 


(i) the performance of the two classes on the test. 
(i) the variability of scores in the two classes. 


Solution 
(i) Since the mean score of 


Class B, therefore, Class A has 


the test. 
(i) For comparing two classes in respect of variability 
f variations are calculated in the 


Class A is greater than that of 
given a better performance on 


among scores, coefficient o 
two cases. 


CV. for Class А = 21-х 100 -41 х 100—26.67 
XI 


Similarly, 
CV. for Class В =. x 100 = AX 100— 38.46. 


X2 


ater in Class B. The 
e more consistent 
mpared to the 


Therefore, the variability of scores is gre 
students in Class A, having a lesser C.V., аг 
in scoring around their average score as co 
students in Class B. 


f 10-year-old boys has a mean 


2 ст. The same group of boys 
hac of 3.5 kg. In which trait 


Example 5.16. A group O 
height of 137 cm. with a c of 6. 
has a mean weight of 30 kg. wit 
is the group more variable ? 


Solution 
In the present example, two groups not only differ in respect 


of mean but also in units of measurements which is cm. in 
the first case and kg. in the second. Coefficient of variation may 
be used to compare the variability of the groups in such a 


situation. We, thus calculate— 
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СМ. for first trait (cm) =f! x 100 = 92. x100 


Xi 137 
—4.53 
: Ga 3.5 
C.V. for second trait (kg.) === X 100 =— x100 
x2 30 
7211.67. 


Thus, the boys are more variable on weight than on height. 


5.5. Characteristics for an ideal measure of dispersion 


The desiderate for an ideal measure of dispersion are the 


same as those for an ideal measure of central value as stated 
in section 4.5, Now, in the following sub-sections we discuss 


the merits and demerits of various measures of dispersion in 
the light of requisites . 


5.5.1. Merits and demerits of range 

1, Range is the simplest measure of dis 

to calculate and rigidly defined, 

2. Its computation is based on only two extreme observ- 
ations which may be purely accidental. 
It is not capable of being handled algebraically as its 
computation is not based on all the Observations. 
4. Its value, being based on extreme values, is highly 

affected by extreme observations. 
5. Its value is highly affected by fluctu 

in most of the cases. 


persion, It is easy 


ations of sampling 


Due to these demerits the range is an unreliable measure of 
dispersion but it may be used frequently in Situations where 


we are interested in roughly comparing two Or more sets of 
observations for variability. 


5.5.2, Merits and demerits of M.D. 
l. Mean deviation is easy to interpret. 
2. It is easy to calculate and rigidly defined. 
8? Its Computation is based on all the observations. 
4. Since it is calculated about mean or median, it has all 
the merits and demerits of these central values, 
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5. 


6. 


It is less affected by the extreme observations as 


compared to S.D. 
It ignores the positive and negative signs of deviations 


and, therefore, is not amenable for further algebraic 
treatment. 


Due to certain inherent drawbacks the mean deviation is 
also not a common measure of dispersion. But it is very 
useful when extreme deviations influence standard deviation 


unduly. 
5.5.3. Merits and demerits of S.D. 
1. It is rigidly defined but comparatively difficult to 
calculate. 
2. [ts value is based on all the observations. 
3. It is capable of further algebraic treatments. 
4. It is not much affected by sampling fluctuations. 
5. It gives greater weight to extreme observations. 
6. Standard deviation is based on arithmetic mean, there- 


Since standa 
a good measure of dispersion, 
Standard deviation plays a м 


fore, it has all the characteristics of this central value. 


rd deviation possesses most of the properties of 
it is widely used in statistics. 
ital role in sampling and 


correlation theory. 


' Define the terms 


Problem Set 5 


Explain with suitable examples the term ‘dispersion’. 
What purpose does a measure of dispersion serve 2 
Describe the various measures of dispersion. 

‘standard deviation’ and ‘mean 
deviation’. How is standard deviation a better measure 
of dispersion than mean deviation ? 

Discuss the utility of a coefficient of variation. 

Discuss the relative merits and demerits of various 
measures of dispersion. 

Calculate (i) Range (ii) Mean deviation and (iii) Б.О. 
for the following set of scores. 


20 39 40 25 36 15 35 30 23 27 
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Find mean deviation and standard deviation for the 
following frequency distribution. 


Xx: 5 10 15 20 25 30 
f: 2 Л 10 15 11 5 


Find the standard deviation of the weights of the 100 
male students. 


Weight (kg) : 60—62 63—65 66—68 69—71 72—74 
No. of students: 5 18 42 27 8 


Calculate standard deviation for theset of scores 
aE “ОРЕК „5 


What will be the standard deviation of the new set of 
scores obtained by adding 5 to each of the scores in 
the above set. 

In problem 9, if the new set of scores are obtained by 
doubling each score in the set. How will the S.D. for 
new scores be affected. 

On a final examination in statistics the mean mark of 
a group of 150 students was 78 and the S.D. 8.0. In 
Economics, however, the mean mark of the group was 
73 and the S.D. 7.6. In which subject was there the 
greater (a) absolute dispersion (b) relative dispersion. 
A person calculated the S.D. of given scores awarded 
out of a maximum score of 50. Later on he was asked 
to double each score as scores were to be awarded out 
of a maximum score of 100. How will he modify the 
calculation of S.D.? 

Two classes consisting of 40 and 60 male students have 
mean scores 25 and 30 respectively. If their standard 
deviations are 4 and 5 respectively, find the mean and 
S.D. of the combined class of 100 male students. 

In a cricket season batsman A Bets an average of 62 
runs per innings with S.D. of 16 runs while batsman BF 
gets an average score of 42 runs with a S.D. of 10 runs 
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If the consistency of perform 


in an equal number of innings. Discuss the consistency 
of both the batsmen. 

The following data relate to the number of runs scored 
by batsman A and B in nine innings: 


Scores of A: 5 40 25 35 127-48 60° 53: 20 
Scores of B: 25 40 18 65 10 51 22 40 42 


Find (i) Who is a better run getter? 
(ii) Who is more consistent as a batsman? 


Following are the marks obtained by two students A 
and B in 10 tests of 100 marks each: 


Tests p 23 4-5! 6 778 9 410 
Marks obtained 
by A: 44 80 76 48 52 72 68 56 60 54 


Marks obtained 


by B: 48 75 54 60 63 69 72 51 57 66 


ance is the criterion for 


awarding a prize, who should get the prize? 


12. 
13. 
14. 


ме койсоз оу 


Answers 


(i) 20 (1) 7.0 (iii) 8.0 

M.D.=5.316, S.D.—6.53 

2.92 

2.16, S.D. will remain unchanged. 

S.D. will be doubled ie, S.D=4.32 

(a) Absolute dispersion is greater in 
7.6. 

(b) Relative dispersion is 
(Statistics) — 10.26 an 

S.D. too will be doubled. 


Statistics as 8.07 


greater in Economics as C.V. 
d: GV: (Economics) = 10.41 


5.23 
С.У. (A)=25.8, C.N. (B)=23.81, Batsman B is more 


consistent. 
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(i) The total runs scored by batsman A=298. The 
total runs scored by batsman B=313. 


Thus, batsman B is a better run getter. 


(ii) The C.V. of runs scored by batsman A —53.88 
The C.V. of runs scored by batsman B—47.30. B 
is a more consistent batsman. 


СМ. (A)=19.3 and СМ. (В) = 14.0. B is more consistent 
and should get the prize. 


6 


CORRELATION 


6.1. Introduction 
In previous chapters we have confined ourselves to univariate 


distribution, i.e., the distributions relating to one quantitative 
variable only. In practice we come across a number of situa- 
tions involving the study of two or more variables. For example, 
consider the scores of five students in mathematics and physics 


as under: 


Student 1 2 3 4 5 
Scores in Maths (x) 10 7 9 6 5 
Scores in Physics (у) 8 6 10 7 4 
Here each student assumes values on two variables X (score 


in maths) and y (score in physics) simultaneously and as such 
we get a bivariate distribution. Similarly, the distribution involv- 
ing more than two variables are called multivariate distributions. 
The present chapter deals with the study of bivariate distribu- 
tions in which two variables may be inter-dependent, ie., they 
may co-vary. In case the change in one variable appears to be 
accompanied by à change in the other variable, the two variables 
are said to be correlated and this inter-dependence is called 
correlation or covariation, In short, the tendency of simultancous 
variation between two variables is called corrélation or covaria- 
tion. For example, there may exist a relationship between 
heights and weights of a group of students, the scores of 
students in two different subjects are expected to have an inter- 
dependence OF relationship between them. To. measure the 
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degree of relationship or covariation between two variables is 
the subject matter of correlation analysis. 


6.2. Types of correlation 
In a bivariate distribution, the correlation may be: 


6.2.1. Either positive or negative, and 
6.2.2. Either linear or non-linear (i.e., curvilinear) 


6.2.1. Positive and Negative Correlation 

When the increase in one variable is followed by a corres- 
ponding increase in the other, the correlation is said to be 
positive correlation. In short, two variables covarying in the 
same direction are called positively correlated. For example, 
We expect a positive correlation between heights and weights of 
a group of individuals or the scores obtained by a group of 
students on two tests of mental ability. 

If, on the other hand, the covariation between the two 
variables is in opposite directions, i.e, the increase in one 
variable results in a corresponding decrease in the other, the 
correlation is said to be a negative correlation. As an example 
of negative correlation, we may consider the variation in the 
consumption of some commodity in respect of its price. Here 
an increase in price is expected to decrease the consumption. 


6.22. Linear and Non-linear or Curvilinear Correlation 

` The distinction between linear and non-linear correlation is 
based upon the ratio of change between the variables, In 
perfect linear correlation the amount of change in one variable 
bears a constant ratio to the amount of change in the other. 
For example, consider the scores of eight individuals on two 
tests x and y, The pairs of scores are as follows: 


Individuals 2X] 


2 3 4 5 6 7 8 
Score on test x: 7 6 4 9 8 7 6 10 
Score on tety: 9 g @ IR TO 9 818) 


In the above table we observe th 


` at each individual scores 2 
points higher on test y 


than his score on test x. This simply 
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means that the covariation between the above two variables is 
expressible in the form у--х--2 which is an equation represent- 
ing a straight line, i.e., a perfect positive linear relationship, in 
which correlation between x and y will be +1. 


63. Methods of studying correlation (ungrouped data) 

The various methods of studying correlation between two 
variables in the case of ungrouped data may be listed as 
below: 


6.3.1. Scatter diagram method 
6.3.2. Karl Pearson's coefficient of correlation 
6.3.8. Spearman's coefficient of rank correlation 


6.3.1. Scatter diagram method 

Scatter diagram or dotdiagram is a graphic device for draw- 
ing certain conclusions about the correlation between two 
variables. In preparing a scatter diagram, the observed pairs 
of observations are plotted by dots on a graph paper in a two 
dimensional space by taking the measurements on variable x 
along the horizontal axis and that on variable y along the 
vertical axis; The placement of these dots on the graph reveals 
the change in the variable.as to whether they change in the 
same or in the opposite, directions. Scatter diagrams, a$ an 
example, showing various degrees of correlation are shown 
on the next page. à Эн 

In case all the dots are lying on a straight line of positive 
slope (Fig. 6.1(a)], we have a perfect positive correlation between 
the two variables, i.e., r= +1. Similarly, if all the dots are 
lying on a straight line of negative slope (Fig. 6.1 (b)], correla- 
tion between two variables will be perfectly negative and in 
such а case r-—1. Should the dots lie close to a straight line 
of positive slope [Fig. 6.1 (с)] ог negative slope [Fig. 6.1 (4), 
we have a high degree of positive or negative correlation 
between the two variables. If the dots do not follow a pattern 
along with a straight line as in [Fig. 6.1 (e)), we have no corre- 
lation or zero correlation and conclude that no linear relation- 
ship exists between the two variables x and y. In view of the 
above discussion, it is now clear that the correlation between 
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x 


х 
Perfect positive correlation Perfect negative correlation 
t=+1 г=—1 
а 


х = 
High degree positive High degree negative 
correlation correlation 


d 


Zero coirelation 
e 


Fig. 6.1. Scatter diagrams showing various Degrees of Correlation 


two variables, varying from —1 to 4 


р : f +1, is a measure of their 
inear relationship. The greater the scatter of the dots from 


the straight line on the graph, the lesser the correlation. 


6.32. Coefficient of cor 
This is also know: 
or product moment c 


relation 


п as Karl Pearson's coefficient of corrélation 


orrelation coefficient. ‘It is one 
| 7 of the most 
widely used algebraic methods of finding correlation between 


tw i і 
о variables. The coefficient of correlation, denoted Буг, 
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gives an exact idea about the degree of linear relationship 
between two variables and is defined. as: 


covariance (x, у) _ cov (x, y) 
Gx Gy Gx Gy і z oy) 


Again since [Z(x—x)(y—X)]/n is taken as a measure of 
covariance between x and y [cov (x, y)], we can say 


“2 (х-5ХУ-3) 
dup pe пада (6.2) 
Де cf 26—902) = (6.3) 
V Z(x—x? Z(y—»»* 
where, n=the number of paired observations. 
Cov (x, y)=covariance between variable x and y. 
c,—the standard deviation of variable x. 
oy==the standard deviation of variable y. 
3(x—x)?=the sum of squared deviations of x taken from 
mean x. 
X(y—3y'—the sum of squared deviations of y taken from 
mean J. 
X(x—X)(y —y )=the sum of the products of 
y from x and y respectively. 


deviations of x and 


The value of coefficient of correlation r, as obtained by 
using formula (6.1), (6.2) and (6.3) shall always lie between 
— | and +1. When r——1 or +1, the correlation between two 
variables is said to be perfectly negative Or perfectly positive. 
An intermediate value of r between —1 and 41 indicates the 
extent of linear relationship between X and y whereas its sign 
tells about the direction of this relationship. The value г=0 
means no linear relationship between the two variables. It is, 
wrong to say that r—0.8 indicates a covariability 


twice that indicated by r—0.4. A coefficient of 
ve, is a pure number which is 


however, 
which is 
correlation, as discussed abo 


f the unit of measurement. 


independent o 
f correlation can be obtained 


Karl Pearson's coefficient о 


by using one of the following methods: 
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6.3.2. (a) Direct Method. 
6.3.2. (b) Short-cut Method. 


6.3.2. (a) Direct Method of calculating coefficient of 
correlation (ungrouped data) 
For ungrouped data, formula (6.3) is used for calculating 
coefficient of correlation between two variables. This formula 
may be further simplified for computation purposes as under: 


paca y сы Н (6.4) 
A [nZx2— (Zxy?] [nZy2 (Ху) 


Here, 2x=the sum of observations on variable x. 


Zy —the sum of observations on variable y. 
Zx?—the sum of squares of observations on vari- 
able x. 


Zy?=the sum of squares of observations on vari- 
able y. 


2xy=the sum of the products of observations on x 
and y variables. 


It should be noted that formula (6.3) is suitable in ‘cases 
where the mean values x and y are not in fractions. In case 
these happen to be fractional numbers, we should apply formula 


(6.4) for computing r. The computation procedure will be 
clear from the following examples. 


Example 6.1. Calculate coefficient of correlation between 


Scores of two students on five testings as given below: 


Testings : 1 2 3 4 5 
8 12 Эм 


Sy AS ИТ lS 


Scores of first student (x) ; 10 
Scores of second student Ку) E US 


Solution 


In this example, we first use formula (6.3) for calculating r. 
The needed calculations are given in Table 6.1. 
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le 8). 2 2 + I o SSS 


6—(€—4)(x—X) 


(€—4)(x—X) 


01-46-498| 01=:х—х) sou 
| ima = == 
/ 1 [к т Кис 8I п s 
0 1 0 | = LI 6 v 
12 v T | (Aan 61 (4! t£ 
v v C= e 51 8 2 
f= ГЕ: lA m 24029 2 or 1 
| 4€-9 | 4» 4-4 | eh | салону | панк Е 


х jo попезпа шоу 


T9 алау 
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Here a- = 39 у 
n 3 
Ху 85 

=— = -Та--17 
n 3 


Using formula (6.3), the cocfficient of correlation 


Х(х-хХУ-у) 
У х -3)??x X(y—y y 


T= 


Putting values from table, we have 
т=з 2... = 
V10x10 10 
Thus, there is a high degree of positive correlation between 


the scores of students, 


Example 6.2. Calculate coefficient of correlation for the 
data in example 6.1. Use formula (6.4). 


Solution 
The necessary calculations are given in Table 6.2. 


TABLE 6.2 


: Computation of r 


| 

S. No. X y x2 y2 xy 

1 10 16 100 256 160 

2 8 15 64 25 | 120 

3 12 19 144 361 | 228 

4 9 17 81 289 | 153 

5 11 18 121 34. | 198 

КОРУ, Ec 4 | ———_____ 
n-5 5 х=50 Zy=85 | Yx2=510 Dy2=1455 | 


Sxy=859 
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Putting the values from table in formula (6.4), we have 


гай nZxy —(Zx)Z() 

vn Zx2— (2) ] [п5у2—(®у)?] 

b. 5x859—50x85 
415 х510--(50)1 [5 x 1455— (85)°] 
4295—4250 
[2550—2500] [7275—7223] 
45 _ 45. 

4/50 х 50 50 
=0.90. 


Which is the same as we obtained in example 6.1. Still, the 
formula (6.4) involves а lot of computational work which 
becomes more tedious. when the given variate values are large. 
To overcome this difficulty, we use а formula based on devia- 
tions of the variates from some suitably chosen values as given 


in the following method of computing r. 


6.3.2. (b) Short-cut method for calculating r 
According to this method, the coefficient of correlation is 


calculated by using the following formula: 


A nXdxdy —(Zdx)Cdy) 
V [nZdx— Xdxy] [nZdy? — dy] 


Here, dx=(x—A) =the deviations of x taken from assumed 


mean A. 
dy=(y--B)=the deviations of y taken from assumed 
mean B. 
Xdx—the sum of deviations dx. 
Xdyethe sum of deviations dy. 
Zdxithe sum of squared deviations dx?. 
deviations дуг. 


Sdy?=the sum of squared E 
ucts of deviations dx and dy. 


Zdxdy-the sum of the prod 


ef ollowing example. 


Computation is clarified in th 
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Example 6.3. The scores of a class of 10 students on a mid- 
term examination (x) and on the final examination (y) are as 
follows: 


SD 52 13» 714 83 96 90 86 65 70 
У БОСО TI-NEd42 49:1 785. 96 .82 ..698 73 


Find Pearson's coellicient of correlation. 


Solution 

In the present example the figures are large enough to 
suggest the application of short-cut method of computing r. 
Here deviations of x and y are taken from A=75 and B=65 


respectively. The necessary calculation may be tabulated as 
under: 


Putting the values from Table 6.3 in formula (6.5), the 
coefficient of correlation between x and y 


r=... DZdxdy —(Zdx)(Zidy) 
v/[nZdx2 — (Zdx)?] [nZdy?—(Zdy)?] 
25 10х916-18х72 
V 10x 1526-4118) [10 x 2854—(72)2] 


5 9160--1296 
М (15260 — 324)(28540 — 5184) 
= 7864 » 7864 
/14936X 23336 122.21х152.83 
.. 786400 _ 
egas 0900 


Here, it is a moderate positive Correlation between x and y. 
6.4. Properties of r 


6.41. Effect on r when a constant is added to one or 
both variables 

To observe the effect on the coefficient of correlation r when 
à constant is added to one or both the variables, 


we consider an 
example where x and y Tepresent the origina 


1 scores of two 
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students on four testings. Now, we add a score of 10 to each 
score of first student and 20 to each score of second student 
and represent these scores by x’ and y' respectively. The 
calculations needed for computing r for original and new pairs 
of observations are given in the Table 6.4 (P. 138.) 

; Therefore, by using formula (6.4), the coefficient of correla- 
tion for original scores x and y will be: 


He n Zxy— (Хх) (Ху) 
V [n2xi— xy] [nZyt - £y) 1 
4x96—15x22 
= V Bx —5)]I4x138—02)] 


384—330 
= 471284225] [552 — 484] 
T1954 54 
5968 63:34 
= +0.85 


The same formula for new scores сап be written as 


да» “Еку-(8 х) (ФУ) _ 
Уак (Ex P] [а 5у'2—(®у')?] 


Putting values from table, we have 


EE 4x 1416—55 x 102 
v [Ax 711—(55y] [4X 2618 —(102)"] 


2 5664 —5610 
1/3084 — 3025] [10472 — 10404) 
21-34 де и 
= 75968. 34 
= 10:85 


Which is the same as we obtained in the case of original 
we observe that the value of the coefficient of 
unchanged when a constant is added to one 
hen a constant is subtracted 


scores. Thus, 
correlation r remains t 
or both variables. The same is true w 


from one or both variables. 
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6.4.2. Effect on r when one or both variables are 
multiplied by some constant 


To observe the effect of multiplying the variables by some 
constant on the value of r, we arbitrarily multiply the original 
scores of first and second student in the previous example by 
10 and 20 respectively. The new scores obtained by such 
multiplication will be represented by variables x’ and y’ 
respectively. The correlation coefficient between x‘ and y’ may 
then be calculated as under— 


TABLE 6.5 
Effect on r when variables are multiplied by some 
constant 
New Scores 
S. No. x2 y'a Хау“ 
, 
x^ Ww. y 
1 10 | 80 100 6400 800 
2 30 60 900 3600 1800 
3 50 140 2500 | 19600 7000 
4 60 160 3600 | 25600 9600 
| 
| = 
N=4 | Xx =1502y' =440 | ух'2=7100 | У Zy'4=55200 | Xx'y'—19200 


Therefore, the correlation сана between x’ and y’ 
n Zx'y —(Xx') (Ху) 
~ Vinx? (zx y] [nZy?—(Zy'y? 
Putting values from table, у 
4x 19200 —(150) (440) 
= V[4x7100—(150))] [4x 55200 = (4408) 
_____ 76800— 66000 
27 (28400--22500) (220800--193600) 
10800 = 108 


-© V5900x27200 1266$ 
=+0.85, 
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Thus the value of r is still the same as that obtained for the 
original scores in the previous section. In view of this, we con- 
clude that the value of the coefficient of correlation remains 
unaltered if one or both sets of variate values are multiplied by 
some constant. Again since the division by some constant may be 
treated as a multiplication by the inverse of the constant and as 
such the value of r will remain unchanged even when one or both 

sets of variate values are divided by some constant. 

Comparing the above two results in sections 6.4.1. and 
6.4.2, we can say that new variables x'—ax--c and y —by-*d 
has the same correlation as X and y, any linear transformation of 
x any y does not change r. In simple words it means if x scores 
are multiplied by a constant and another constant is ,added to 
these and smilarly y scores are changed by multiplying and 
adding of constants. then the correlation between new sets of 


scores is the same as that between X and y. 


6.5. Correlation (Grouped Data) 

In case the number of pairs of measurements on two 
variables x and y are large, we group the measurements in the 
form of a two-Way frequency distribution which is also called а 
bivariate frequency distribution. For the choice of class size and 
limits for a bivariate frequency distribution, more ог less the 
same rules are followed as in the case of a univariate frequency 
distribution discussed in chapter 3. To clarify the idea, we 
consider а bivariate data concerned with the scores earned by 
a class of 25 students in Statistics and Mathematics examina- 
tions, the maximum score for each paper being 50. 


The ungrouped bivariate data may then be grouped into a 
bivariate frequenc Table 6.7. Here, we 


y table as shown 1n 
classify each pair of variates simultaneously in the two classes, 
one representing score in Statistics (x) and the other in Mathe- 
matics (y). 

From Table 6.6, we observe that there are two students 
whose scores in Statistics as well as in Mathematics fall in 
score class 10-19. Therefore, in Table 6.7, the frequency in the 

(first cell in Table 6.7) together 


cell representing these classes eth 
is 2 Similarly; there are 3 students whose scores In Statistics 
fall in class 20-29 while their scores in Mathematics fall in 
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TABLE 6.6 


Scores of 25 students in Statistics and Mathematics 
(Ungrouped data) 


Student | Scores їп Scores їп | Student Scores in Scores in 
No. | Stat. Maths. No. Stat. Maths. 
| (x) (у) (x) (у) 
Ри ЈЕ aar. 
PEDE 2I 22 13 37 43 
2 20 24 | 14 39 41 
3 17 22 15 46 45 
4 19 26 16 13 15 
5 28 35 17 14 17 
6 | 36 28 Ї 18 42 36 
7 32 34 ! 19 41 35 
8 | 36 35 $ 14/20 45 32 
9 | 38 33 ЈЕ 21 29 15 
10 38 42 22 23 24 
11 | 23 18 23 27 25 
12 27 19 24 25 | 26 
LI E WO БУШ, ی2‎ Р 24 | 29 


class 10-19 and, therefore, the frequency for this cell is 3. 
Further, there is no student who must have scored in Statistics 
within class 30-39 simultaneously with his score in Math<matics 
within class 10-19 and, as such, the frequency for this cell is 
zero. Other cell frequencies in Table 6.7 too may be similarly 
interpreted. Marginal column and row totals in Table 6.7 show 
the distribution of frequencies in respect of variables x and y 
respectively. For example, 5 students have scored in Mathe- 
matics in class 10-19, 9 in class 20-29, 7 in class 30-39 and 4 in 
40-49. Similarly 4, 10, 7 and 4 represent the number of students 


Scoring in Statistics in classes 10-19, 20-29, 30-39 and 40-49 
respectively. 
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TABLE 6.7 


Bivariate frequency table 


Scores in Statistics (x) 
Scores 
Classes $ Total 
10-19 | 20—29 | 30—39 | 40—498 
10—19 2 3 = = 5 
= 
= 20—29 2 6 1 — 9 
2 à 
85 [L———|—-——-- 205 
Bv] 
8 | 30—39 -— 1 3 3 7 
| 
40—49 — — 3 1 4 
| PR 
Total 4 о 7 4 N=25 


6.5.1 Pearson's coefficient of correlation in grouped data 
In the study of correlation in a bivariate frequency distribu- 
tion, Pearson's coefficient of correlation is given by 


___ N Zfdx dy (2645) (fd) — 9 
r= UIN Stax? TAXI [N 2714у2— (14у) 


The formula (6.6) is similar to that used for grouped data, 
i.e., the formula (6.5). The only difference is that the deviations 
5 the formula are also multiplied by the respective frequencies. 


i i in computation. 
ша (6.6) involves the following steps in co 
Ши ed mid-points of the classes as x and y for both the 


variables. 
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(ii) Consider deviations or step-deviations for both the 
variables and denote them as dx and dy respectively. 

(iii) Multiply dx, dy and the corresponding frequency in 
the cell and write the resulting figure in the upper right hand 
corner of the cell. 

(iv) Add all these cornered values in the cells to get 
Zfdx dy. : 

(v) Multiply dx by the frequencies of the variable x and 
obtain the sum as Zfdx. 


(vi) Multiply dy by the respective frequencies of the 
variable y and add them to get Ху. 

(vii) Multiply dx? by the corresponding frequencies of the 
variable x and obtain the sum as Zfdx?. 

(viii) Multiply dy? by the corresponding frequencies of the 
variable y and obtain the sum as Sfdy?. 


Use these values in formula (6.6) to get the coefficient of 
correlation between the variables x and y. The procedure is 
clarified in the following example. i 


Example 6.4. Find coefficient of correlation between scores 
obtained in Statistics and Mathematics by 25 students as given 
in Table 6.7. 


Solution 

To facilitate computation involved in calculating r in the 
case of a bivariate frequency distribution, we prepare Table 6.8 
showing necessary calculations. 

Last column and the last row of Table 6.8 provide a check 
for the total Xfdx dy which is 20 in both the cases, Cornered 
values in the table represent the product fdx dy. For example, 
in the first cell, we multiply dx=—2 by ду= —2 to get dx dy=4 
and then further multiply it with cell frequency 2 to get 
fdx йу=2х4=8. Other cornered values too are calculated in 
a similar way. The readers are already familiar with the com- 
putation procedure given in the last four columns and rows of 


the table. | Now putting values from the table in formula (6 6), 
the coefficient of correlation between x and y will be 


Correlation 


TABLE 6.8 


Computation of r in grouped data 


Scores in Statistics 


Classes | 10-49 | 20-29 | 30-39 40-49 | 
x 


145 | 245 | 345 | 445 
52: uet 0 1 NE 


ЁГ [| [| [0 
2g 16 08 
w 15] 


1 
[6] 
о. Тз лз 
5] oj 6 


о з 1 
4 |ме25 
1 


10-19 |Classes, 


1 


Scores in Mathematics 


40-49 | 30-39 | 20-29 
345 


1{дх?== 
pee | % | даага 30 
ахау ^ 
пау | 12 | 12 | 0 А 25 


N Zfdx dy —(2fdx) (2fdy) 
r= 7N Stdx2—(Zfdx)?) [N Sftdy2—(2fdy)"] 
25x25—(—14) (=15) 


= 2 
V [25x 30-(- 14) (25% 33-(— 15)2] 


(0062515216 
V [750—196] (825—225] 
415 41500 0724 
+ ا‎ E approx. 
4/554 х 600 516.55 


Which shows a high positive correlation between scores in 


Statistics and Mathematics. 


man’s rank correlation coefficient 


6.6. Spear 4 lent i 
The abovè method discussed the covariability of linear 
relationship between two quantitative variables. But there 
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may be situations in psychology and education when it is not 
possible to make definite measurements on the variables, but 
these measurements are intuitive and subjective. For example, 
the evaluation of a group of students on the basis of leadership 
ability, the ordering of women in a beauty contest, the students 
ranked in order of preference or the pictures may be ranked 
according to their aesthetic values. In such cases objects or 
individuals may be ranked and arranged in order of merit or 
proficiency on two variables and when these two sets of ranks 
covary or have agreement between them, we measure the degree 
of relationship by a rank correlation. 

A formula, called rank correlation coefficient, to measure 
the extent or degree of correlation between two sets of ranks 
was developed by spearman which is denoted by Greek letter 
е (called rho) and given as 


62D? 


p=1— “ир DT (6.7) 


Here o—the spearman's rank correlation coeflicient 
D the difference between paired ranks 


n =the number of items ranked 


In fact p is Pearson's coefficient of correlation between two 
sets of ranks. Thus the value of p can also be interpreted in 
the same way as Karl Pearson's coefficient of correlation. It 
varies between —1 and --1. The value +1 stands for a perfect 
positive agreement or relationship between two sets of ranks 
while p= — implies a perfect negative relationship. In case of 
no relationship or agreement between ranks, the value of p=0. 
The discussion will be more clear from the following; 

Let us first consider the case when ranks are in a state of 
. Complete agreement as shown in Table 6.8. 

Here R; stands for the ranks òn variable x and R, for the 


ranks on variable y. Using formula (6.7), the rank correlation 
coefficient will be А 


р=1— 62D? 
n(n?—1) 
6x0 
5(25—1) 
=1–0=1.0 
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An example for p=1 


Ranks on x Ranks on y Rank difference 
(R1) (Re) D=R,—R2 р! 
1 1 0. 0 
2 2 0 0 
3 3 0 0 
4 4 0 0 
5 5 0 0 
n=5 3}D?=0 


which shows a perfect positive correlation between the 


given ranks when we get = —] is a$ shown below: 
TABLE 6.9 
An example for p= —1 
Ranks on x Ranks on y Rank difference we 
(R1) (Ro) D-Ri-R2 D 
EN RO ume oos 
182 5 -4 16 
2 4 -2 4 
8 3 0 0 
4 2 2 4 
5 1 4 16 
eS Еа Зар 
5 ED-40* 
n= 
11.04 P BÉ 
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: 6D? 
Неге, о=1— пре) 
E e 6x40 
H 5(25—1) 
-1-2 


=—1. 


Some Examples 


Example 6.5. Find spzarman’s rank correlation coefficient 
for the data in example 6.1. 


Solution 


In this example, we are given actual scores of the two 
students and not the ranks. Thus, in order to find rank 
Correlation coefficient, we first assign ranks to scores of each 
student as Ri and R3 as shown in Table 6.10. Here a rank 1 
is assigned to the highest score and so on. The same procedure 
is applied for both the variables. 


\ 


Using formula (6.7), we have 


ШРИ 62D? 
Res n(n?—1) 
-21 6х2 
F 5x24 
-1-010 ` 

=0.90 


showing a high degree of correlation between x and y 
and also agrees well with the value of r=0.90 obtained in 
example 6.1. 

Example 6.6. Ten students are ranked in order of preference 
by three examiners in the following order: 


Ist Examiner (Ri): 1 5 4 8 9 
2nd Examiner (203 42202727 
3rd Examiner (OG 5 
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Find which of the two judges have the maximum agreement 
between their preferences. 


Solution 

Here we calculate rank correlation coefficients among the 
sets of ranks given by different examiners as the measures of 
agreement reached in their ranks. The needed calculations are 
shown in Table 6.11. 


Now, the rank correlation between ranks given by Ist and 
2nd examiners, say бүс, will be 


бри‏ ا 
n(n?—1)‏ ~ 
уз бем‏ 

\ 10х99 
=1—0.45 
=0,55 


Similarly, the rank correlation between the ranks of 2nd 
and 3rd examiners, say p», is given by 


= бари. 
n(n?—1) 
_ 6x44 
10x99 
=1—0.27 
=0.73. 


gal 


-1 


Lastly, the rank correlation between the ranks of Ist and 
3rd ‘examiners, say суз can be calculeted as 


Beis 65,2 
n(n?—1) 

_ 6x156 
10x99 

=1—0.96=0.04 


(у 
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Comparing 212, 023 and рз we observe that 2nd and 3rd 

examiners have the highest degree of agreement (g23==0.73) 

etween their sets of ranks, and so we may conclude that these 
two examiners most closely agree in their preference. 


6.6.1. Rank correlation for tied ranks 

Suppose we are given n pairs of variate values for two 
variables x and y and some of these values in either of the two 
series of observations are equal. In such a case we assign equal 
mean rank to a set of equal variate values. Thus, if two variate 
values.are ranked equal at 3rd place, they are each given the 
mean rank (3+4)/2=3.5 and the next variate value is ranked 
Sth. Similarly, if the three variate values are ranked equal at 
6th place. they are cach assigned the mean rank (64:74-8)/3--7 
and the next variate value (i.e. 9th in order) will be ranked as 
9th as the ranks 6th, 7th and 8th have already bcen used. The 


Spearman's. rank correlation coefficient formula corrected for 
these tied ranks is then as follows: 


у | у mim? —1) 
6 [графу am. 


n(n2—1) (6.8) 


к=! 


Неге m=the number of equal variate value with common 
ranks. 


Again, since there can be more than one set of such equal 
variate values, the correction factor [m(m?—1)/12] is to be 
added each time to the value of ZD? for every value of m. This 
tendency of correction has been represented by the factor 
Z[m(m?—1)/12] in formula (6.8). The procedure will be clear 
from the following example. 

Example. 6.7." Find spearman's rank correlation between 
the scores of 10 students in first and second test as given below: 


Student ЈИ 2" 5. 4 
First Test (x) 
Second Test (y) 


S96 7 8 ©9 10 
65 63 67 64 68 62 70 66 68 67 
69 66 68 65 70 66 68 64 71 67 


Correlation 
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Tn ranking the scores on Ist test (x), we give rank 1 to the 
highest score of 70. Then, there are two next highest score of 
68 ranked with mean rank (24-3)/2—2.5 each. Again there 
are two next lower tied scores of 67 ranked with mean rank 
(4+5)/2=4.5 each. The, next highest score of 66 is then ranked 
as 6 (because we have already used ranks from one to five). 
Other lower untied scores are ranked as usual. The same 
procedure is applied in ranking the scores on 2nd test (y. In 
the process, we observe that there are 2 tied ranks for score 68 
and 2 for 67 on the first test. Similarly, there are 2 tied ranks 
for score 68 and 2 for score 66 on 2nd test. Thus, in all, there 
are four sets of tied ranks with m—2 in each case. Now, using 
formula (6.8), Spearman's rank correlation is obtained as under 


6 [ хр: ос) 2 шэг (1) + тё) 


1 
Ё n(n?) 
(see that the correction factor is added for each case of 
tied scores) 
24—1) | 2(4—1), 24-1) , 2(4—1) 
A 6 [554 a + 12 + SE tp 
5 . 10(100--1) 
6 x 60.50 
=| SOO -1-0.37 
=0.63 


showing а positive correlation between two sets of scores, 


6.6.2. Comparing с and r for tied ranks 

As mentioned earlier, ¢ is Pearsons’s coefficient of correla- 
tion between two sets of ranks. Thus, when ranks are treated 
as scores, and there are tied positions, ¢ in (6.7) is equal to r. 
But with tied observations, the formula in (6.8) should be used 
for calculating ¢ which makes allowa 


nce for a needed correction 
to have it equal to r. 


However, the correction іп this formula 
is negligible and may be safely ignored when the number of 


ties is small; and as such, the formula for e in (6.7) can Бе; 
safely used as a close approximation to г. 
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Problem Set 6 


1. Explain the following concepts: 


(i) Covariation 
(1) Correlation 
(iii) Linear correlation 
(iv) Positive and negative correlation. 
(v) Perfect positive and negative correlation. 


2. What is a scatter diagram 9 How does it help us in 
studying correlation between two variables ? Explain 
the limitations of this method. 

3. How do you interpret the сое! 
Explain various methods studyi 
two variables. 

4. What will be your interpretation if 


fficient of correlation ? 
ng correlation between 


(i) r=0 (ii) r=+1 (iii) r=—! (iv) r=+0.80 and 


(v) r=— 0.80. 


п? How it is calculated? 


5. Whatis rank correlatio 
hich the following vary: 


6. Mention the limits within w 
(i) Karl Pearson’s coefficient of correlation. 

(ii) Spearman’s rank correlation coefficient. 

7. Define the product moment correlation coefficient and 
mention its properties. 

8. The following table shows t 
Hindi and English: - 


he scores of 10 students in 


Student No. St да 4-5: 6747 SE 92 10 
Scores in Hindi: 45 70 65 30 90 40 50 75 85 60 
Scores in English: 35 90 70 40 95 40 60 80 80 50 
8 coefficient of correlation be- 


Find out Karl Pearson’ 
tween two sets of scores. 
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The grades of 9 students of a class on a midterm report 
(x) and on the final examination (y) are as follows: 


х ДІ 50 TN 572“ Sl 94 06 99 67 
у 82 66 78 34 47 85 99 96 68 


Find the correlation between x and у. 


Compute and interpret the correlation coefficient for 
the following data related to grades of 6 students 
selected at random: 


Maths grade 70 92 80 74 65 83 
‘English grade 74 84 63 87 78 90 


The following shows how 10 students were ranked 
according to their achievements in both theory and 


Practical of a biology course. Find Out the coefficient 
of rank correlation. 


Students No. : 


19029534 4858507 во 10 
Ranks in Theory 8 3 9 2 710-4 6 1 5 
Ranks.in Practical: 9 SLO EI ES TESA 2 6 


Calculate coefficient ofr 


ank correlation for the data in 
problem 8. 


Calculate coefficient of rank correlation for the data in 
problem 9. 

Three judges in 
B, C, D, E, F, 
and submitted 


a contest ranked eight candidates A, 
Gand Н in order of their preference 
the choices shown in the following table. 


Candidate : AS BY ‘CD 


E- "EAG: Н 
First judge: Die 2) 8 ETAU Mua 07 
Second judge: 4 5 Je 3 8 I б 
Third judge: — | CE Sd ug CIN, 7 


Find which of the two judges have the maximum agree- 
ment between their preferences, 
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The coefficient of rank correlation in a beauty contest 
for 10 participants was 0.7. However, it was later 
discovered that the rank difference of a participant was 
read as 6 instead of 5. Find the correct value of rank 


correlation coefficient. 


16. The following table gives the number of students 
obtaining different marks in Economics and Statistics. 
Calculate the coefficient of correlation between marks 
obtained in the above two subjects. 

Marks in Statistics 
| 
Marks in , ў | Total 
Economies | 20—- 39. 40-49 | 50-59 60—69 | 

30--39 3 1 1 - 5 

40-49 2 6 1 2) 11 

50—59 ^| i 2 2 1 | 6 

i 
60—69 ! — 1 1 1 5 
| n 

Total 6 10 5 Ї 4 | 25 

17. The following table gives the frequency, according to 
age groups, of marks obtained by 15 students in ап 
intelligence test. Calculate the coefficient of correlation 
between age and intelligence. 

Age in years 
Test rupe mie 
marks 10 20 21 22 23 
0—19 4 4 2 1 1 12 
2 16 
20—39 3 5 4 2 
40—59 3 6 8 1 3 = 
8 4 22 
60—79 = 4 5 
= | | === — 
75 
Total 10 19 20 16 10 5 
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Answers 


. 0:99 
. 0.97 
. 0.24 
. 0.85 
. 0.90 
. 0.67 


. па=0.83, гоэт-0.89, г1з=0.66. Thus, second and third 


judges have the maximum agreement between their 
preferences. 


- 0.767 
. 0.39 
. 0,31 / 
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PROBABILITY AND BINOMIAL PROBABILITY 
| DISTRIBUTION 


7.1. Introduction 


Statistics is the science of decision-making in the face of 


uncertainty. The statistician is generally interested in drawing 
conclusions or inferences from experiments which involve 
For example, with personal observations and 
wishful thinking, we generally make statements like “Mohan 
will probably pass in 10th class"; "Girls in comparison to boys 
have a better chance of success in the examination”; “Eighty 
per cent post-graduate students are likely to be married within 
2 years”; “The chance of teams A and B winning 4 certain 
tennis match are even”; Most probably it will rain today” or 
“The chance of an examinee guessing à correct answer to a 
certain question is 20 per cent.” In ай such statements we have 


expressed an outcome or event with uncertainty, but because 


of past experience ог from an understanding of the structure of 
} f confidence in the 


the experiment we have some degree 0 
validity of our, statements. For making such statements, COn- 
clusions or inferences which have validity, an understanding of 
probability theory is essential for a statistical investigator. — 
The probability theory provides a means of getting an idea 
of the likelihood of occurrence of different events resulting from 
' a random experiment in terms of quantitative measures ranging 
between zero and one. The probability is zero for an impos- 
sible event and one for an event which is certain to occur. The 
other degrees of uncertainties OT the likelihood of occurrence of 
events are indicated by probabilities ranging between zero and 


one. 


uncertainties. 
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7.2. Terminology 
For an understanding of the concept of probability theory, 
the following terminology must be clearly grasped. 


7.2.1. Trial and events А 

Let us consider ап experiment in which results depend-on 
chance and in a particular trial the result can be one of the 
several possible outcomes of the experiment. When the experi- 
mentcan be repeated under essentially the same conditions, 
they will result in different possible outcomes. A single experi- 
ment is called a trial and all its possible outcomes are described 
as events. For example, tossing of a coin is an experiment, and 
One toss of the coin will constitute a trial. The experiment of 
tossing a coin has only two outcomes or events, head or tail. 
Another experiment may be of tossing a die in which there are 
six possible outcomes or events, i.e.. the turning up of any of 
the six numbers 1, 2, 3,4, 5 or 6. 


7.2.2. Equally likely events 
Two or more events are said to be equally likely if one of 
them cannot be expected to occur in preference to any other 
event. In other words, events are called equally likely if the 
likelyhood of the occurrence of every event is the same. For 
example, in tossing an unbiased coin, the head and tail have an 
equal chance of turning up. Similarly, each one of the faces 
marked 1, 2, 3, 4, 5 or 6 has an equal chance of coming on top 
in a trial of tossing with a die and therefore all the six events 
are equally likely. As against it, we may consider an example 
ofa multiple choice type question having 4 choices out of 
which 1 is correct. If the question is answered by guessing, the 
: 4 choices are equally likely to be chosen, but the events 'answer- 
ing correctly' and ‘answering wrongly’ are not equally likely. 


7.2.3. Mutually exclusive events 

Two events are called mutually exclusive when occurrence 
of one implies that the other cannot occur. For example, in 
tossing a coin either head or tail occurs. i.e., the two events, 
head and tail, “cannot occur simultaneously. Similarly in 
throwing a die, the occurrence of any number excludes the 


Probability and Binomial Probability Distribution 161 


occurrence of the others and as such these six events too are 
mutually exclusive. On the other hand, we may consider cases 
where a child may have an 1.Q. over 100 and pass or fail in an 
examination. Here, events ‘passing in examination’ and 'LQ. 
is over 100’ are not mutually exclusive. Also, if a card is 
drawn from a pack of 52 cards, the events ‘the card is ace’ and 
‘the card is spade’ are not mutually exclusive. It is because of 
the possibility that the card is an ace of spade. 


7.2.4. Favourable and unfavorable cases 

The outcomes in an experiment which are favourable to an 
event in which we are interested are called favourable cases cand 
all other outcomes are known as unfavourable cases. The sum 
of the favourable and unfavourable cases is equal to the exhaus- 
tive number of events in an experiment. For example, suppose 
when a dice is thrown we wantto know the probability of the 
event that 3 or 6 turns up. Then the two cases, 1.©., turning up 
of 3 or 6, are favourable to our desired event, while turning up 
of 1, 2, 4 or 5 are four cases unfavourable to the event. Simi- 
larly, if a card is drawn from 4. well shuffled pack of 52 cards, 
we may be interested in the event thatitis a king. There are 
four cases (drawing of any of the 4 king cards) favourable to 


the desired event and the remaining 48 cases (drawing of any of 
he desired event. 


the 48 other cards) are unfavourable to t 


7.2.5. Simple events 

In simple events 
occurrence of single events, i 
posed into a combination 0 
example, the occurrence of е 
simple event. Similarly, drawing © 
pack is a simple event. 


occurrence or non- 
nt cannot be decom- 
f other events: Then, for the dice 
ach of six numbers 18 called a 
f a particular card from а 


we consider the 
уа simple eve 


7.2.6. Compound or Joint events 
In contrast to simple events, there are compound events 


which imply the simultaneous occurrence of two ог more simple 


Thus а compound event can be decomposed into simple 
For example, if a bag contains 4w 


ts. 
мэ hite and 6 red balls 
ke draw of 2 balls at random, t 


events. 
-and we та 


hen the events that 
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‘both are white’ or ‘one is white and one is ted’ are compound 
events. Similarly, if two persons A and B draw one card each 
froma pack of cards simultaneously. such results as ‘both 
cards are king’ are compound events. 


The compound events are further classified as— 


7.2.6. (a) Independent events 
7.2.6. (b) Dependent events. 


7.2.6. (a) Independent events 

If two or more events occur in sucha way that the occur- 
rence of one does not affect the occurrence of another, they are 
said to be independent events. For example. if a coin is tossed 
twice, the results of the second throw would in no way be 
affected by the result of the first throw. Similarly, if a bag 
contains 4 white and 7 red balls and then 2 balls are. drawn one 
by one in such a way that the first ball drawn is replaced before 
the second one is drawn. In this situation the two events as 
"first ball is white. 2nd ball 15 red” will be independent, since 


the composition of the balls in the bag remains unaltered when 
a second draw is made. 


7.2.6. (b) Dependent events 

If the occurrence of one event influences the occurrence of 
the other, then the second event is said to be dependent on the 
first. For example, in the above example, if we do not replace 
the first ball drawn, this will alter the composition of balls in 
the bag while making the second draw and therefore the event 
of ‘drawing a red ball’ in the second draw will depend on event 
(Ist ball is red or white) occurring in first draw, 
Person draws a card from a full p 
the result of the draw made 
the first draw. 


Similarly, if a 
ack and does not replace it, 
afterwards will be ‘dependent on 


7.3. Definition of Probability 
There are two definitions of probability— 


7.3.1. ‘Classical’ ог ‘a-priori’ definition of probability. 
7.3.2. ‘Empirical’ definition of Probability, 
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1.3.1. Classical or ‘a-priori’ definition of probability 
If the trial of an experiment results in n exhaustive, mutually 
exclusive and equally likely cases and m of them are favourable 
to the happening of an event A, then the probability of the event 
A is given by 
m Favourable number of cases (7.1) 


P(A)= — = р 
п Exhaustive number of cases 


and the probability that the event does not happen 
n—m Number of cases unfavourable to event À 


Р(А)= Pern mi 
n Exhaustive number of cases 
(7.2) 
In this way we observe that 
P(A) А Р(А)=-- jens 
n n 
=] (7.3) 
or Р(А)-1-509 (7.4) 
(7.5) 


and P(A)=1—P(A) 


he probability of an event A, then the 


That is, if we know t 
complementary event A is given 


probability of its opposite or 
by formula (7.5). 


It is now clear from the above definition that— 


e ratio of the number of 


an event is th 
of'cases in à 


(i) the probabilily of 
the exhaustive number 


favourable cases to 
^ trial. 
(ii) the probability of an event is 4 positive quantity, 16: 
Р(А)20 
(iii) the sunt of the probabilities of happening and non- 
happening of an event is always equal to one. 
(iv) the probability of an event which does not occur is 0. 
(у) the probability of an event which is sure to happen is 1 
and, therefore, the probability of happening of an event 


ranges between 0 and 1,16» 
0<P(A)<!. 
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7.3.1. (a) Limitations of the classical definition of 
probability 

The classical definition of probability fails to give the proba- 
bility of an event in the following cases— 

(i) When the various Outcomes of a trial are not equally 
likely. For example, if a dice is so biased that it gives even 
numbers more often than odd numbers, then the occurrence of 
numbers on the dice їз not equally probable while this condi- 
tion is necessary for calculating Probability by the present 
definition. Similarly, we cannot apply this definition to find the 
probability of a Success of a candidate in àn examination just 
because the events of ‘success’ and ‘failure’ are not equally 
likely. 

(ii) If the exhaustive number of cases (n) in a trial is 
infinite, then also the definition fails to give the required proba- 
bility. For example, in considering the probability that a bulb 
Will burn less than 1500 hours, we have no way of enumerating 
the total number of Cases and number of favourable cases, 


7.32. Empirical definition of probability 

If a trial is Tepeated a number of times under essentially 
homogeneous and identical conditions, then the limiting value 
of the ratio of the number of times the €vent A happens to 
the number of trials, as the number of trials becomes infinite 
(indefinitely large), is called the probability of the event A. 

Symbolically, if in N trials, an event A happens m times, 
then the probability of A is given by 


P(A)= Lim v (7.6) 
As an example of empirical probability, 
experiment of tossing an unbiased (fair) coin 
in 6 heads and 4 tails, 
0.6. However, if the е 
number of times we exp 
will become stable 
frequency obtained in 
estimate of the empiric; 


let us consider an 
10 times Tesulting 
The relative frequency of head is thus 


Xperiment is carried out a very large 
ect that the relative frequency of heads 
and tend towards 0.50. The relative 


à given number of trials provides an 
al Probability of the event, 
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In the present chapter, we shall restrict ourselves to classical 
probability only. 


7.4. Calculation of probability of simple events! 
Example 7.1. An unbiased coin is tossed. What is the 


probability of getting head? 


Solution 
Since a tossed coin can result in two outcomes only, i.e., 


Head and Tail, thus the exhaustive number of cases—2. 
Now one of them, i.e., the occurrence of head is a favourable 
event, therefore, the probability of the required event 


Favourable number of cases 
Exhaustive number of cases 


P(Head)= 


29 


Example 7.2. A bag contains 3 white and 2 black balls. 
Find the probability of drawing 

(i) a white Бай, 

(ii) a black ball. 


Solution 
(i) Since there are 3--2—5 balls in the bag and out of 


these one ball can be drawn in 5 ways. 


Thus the exhaustive number of cases is 5. There 

are 3 ways in which a white ball can come. Therefore, 
à 4 3 

the probability of drawing а white ball= 06 


(ii) In this case also the exhaustive number of cases will 


remain —5. 
The number of way 
Hence the probability of drawing a bla 


2204) 
5 


5 of drawing a black ball=2. 
ck ball 


‘Permutation’ and *Combination' is very useful for 
с favourable and exhaustive number of cases in an 
sented in appendix 2 given at the end 


1. The concept of 


enumerating th › 
experiment. The concept is pre: 


of the book. 
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Example 7.3. A die is rolled once. Find the probability of 
getting 


(i) a number 3. 
(ii) an even number. 


Solution 
(i) A six faced die can result in 6 exhaustive cases and out 
of these, outcome of 3 is the only case favourable to 
the desired event. Therefore, the probability of getting 
1 
= >=0.17. 
6 
(ii) In this case 3 outcomes, viz, turning up 2, 4 or 6 (even 
numbers) are favourable to desired cvent. Therefore, 
the probability of getting an even number — Ã =0.50. 


Exaniple 7.4. Two cards are drawn from a well-shulfled pack. 
Find thc probability that 


(i) both are spades 
(ii) both are kings 
(iii) one king and one queen. 
Solution 
(i) Probability of both cards being spades 
Let A stands for the event that both cards are spades. 


Since, out of 52 cards 2 can be drawn in $2, 22331 = 1326 


exhaustive number of cases. Again, there are 13 Spades and, 


out of these 2 can be selected in [dee l9 12... 78 favoura- 


ble number of. cases, 


Therefore, 
pays 1. 
1326 17 


(ii) Probability of 2 kings 


Let B denotes the event that both cards ‘are kings. There 
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are 4 kings in the pack and out of these 2 can be selected in 


4 4x3 ; 
402 = Ло) =6 cases which are favourale to desired event B. 


Hence, 


(11) Probability of 1 king and 1 queen 

Let C denote the event of getting one king and one queen. 
There are four kings and four queens in the pack. Therefore, 
favourable ways of drawing a king and a queen simultaneously 


—4 x 4=16. Hence, 


TOKE nS 
PO= 1336 — 663 
Example 7.5. What is the chance that a leap year selected at 
random should contain fifty-three Sundays? 


Solution 
Let A be the event that a leap year has 53 Sundays. A leap 
ays and therefore contains 52 complete weeks 
he following 


year has 366 d 
These 2 days may form t 


and 2 days over. 
combinations— 


(i) Monday and Tuesday. 
(ii) Tuesday and Wednesday. 
(iii) Wednesday and Thursday. 
(iv) Thursday and Friday. 

(v) Friday and Saturday. 

(vi) Saturday and Sunday. 
(vii) Sunday and Monday. 


Thus, there are 7 exhaustive number of cases in the trial. 
and out of these last two cases are favourable (containing 


Sunday) to the desired event. Therefore 


рај = 2. 


168 Elementary Statistics in Psychology and Education 


Example 7.6. Two dice are tossed. Find the probability of 
getting the sum of numbers as 7. 


Solution 


Let A be the event of getting the sum of numbers as 7. 
Two dice rolled will result into 6x 6=36 exhaustive number of 
cases. 

Again a sum of seven on two dice can be obtained with the 
following 6 combinations— 


(1, 6), (2, 5), (3, 4), (4, 3), (5, 2) and (6, 1) 


which are 
favourable to event A. Therefore, 


Example 7.7. Of ever 


y 100 students who are selected, we 
find, on the average: 


10 students were in grade A. 
25 students were in grade B. 
45 students were in grade C. 
20 students were in grade D. 


Find the probability that а student selected at random will 
be having 


(i) grade A 
(ii) grade A or B. 
(iii) grade B, C or D. 


Solution 


(i) Probability that the selected student is having grade А. 
, Let the events that a selected student is from grade A, B, C 
and D be denoted by A, B, C and D respectively. 


Here one student out of 100 can be selected in 
tive cases. 


these one 
Therefore, 


100 exhaus- 
Also there are 10 students in Brade A and out of 


can be selected. in 10 favourable number of cases. 


Р(А)=_10 _ 
(A) 100040. 


Probability and Binomial Probability Distribution 169 


(ii) Probability that the selected student is from grade A Or B. 

Since, the selected student is desired to have either grade A 
or B, therefore, the favourable number of cases is 10+25=35. 
Hence 


P(A or B)= Fe 0:35: 


(iii) Selected student is from B, C or D grade У 
In this situation, favourable number of cases to our desired 
events is 25--45--20--90, Hence 


"РВ, C or D=- =0.90. 


7.5. Rules of probability 

In the following sub-Sections, tw 
robabilities are given which are useful іт simplifying the pro- 
s of mutually exclusive and 


o rules of calculating 


р 
cedure of calculating probabilitie 
compound events. The two rules are— 
7.5.1. Addition rule of probability. 
1.5.2. Multiplicative rule of probability. 
7.5.1. Addition rule of probability (when events are 

mutually exclusive) 

The probability that one of two or то 
events will occur is the sum of the probabilit 


re mutually exclusive 
ies of the individual 


events. 
Symbolically, if P(A) and P(B) are the respective probabilities 
of two mutually exclusive events A and B, then the probability 


that one of them happens is given by 


P(A or B)=P(A)+P(B) (7.7) 


he rule can be extended to any number of mutually 


s as under— 


TOL M) P(A)H-P(B) 3- CO) + Ж 


TI 


exclusive event 
4 P(M) 


P(A or B or C.. (78) 
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The rule is clarified in the following examples— 


Example 7.8. In example 7.7, use addition rule to. find the 


probability that the selected student will have either grade A 
or В. 


Solution 


Let A and B be the events that the selected student has 
grade A and B respectively. Therefore, 


10 
Р(А)=— =, 
(А) 100 0.10 
манго 
P(B)=+ > =0.25 


Since both the events A and B are mutually exclusive, the 
Probability that any one of them happens can be obtained by 
using addition rule given in formula (7.7). i.e., 


P(A or B)=P(A) + P(B) 
—0.10--0.25 
—0.35 


Example 7.9. If the chance of a player A winning a certain 
trophy is + and the chance of player B winning the same 


trophy is T What is the chance that one of them will win. 


Solution 
Let A be the event that player A wins and 
B be the event that player B wins, 


Given that— 


20) 1 
Р(А)-- = 
(A) g and P(B) 


Since, the two events A and B are mutually exclusive (because 
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if A wins, B cannot or vice-versa), therefore, the probability 
that one of them wins will be— 


P(A or B)=P(A)+P(B) 
1 1 


8 


Example 7.10. Froma set of 20 cards numbered 1, 2, 3, 
4,..., 20, one is drawn at random. Find the probability that 


its number is divisible by 3 or 75 


Solution j 

In this example there are 20 exhaustive number of cases. 
Further, let A be the event that the card number is divisible by 
3 and B, the event when it is divisible by 7. Thus the cases 


favourable to event A are the cards numbered 3, 6, 9, 125515 
and 18, i.e., 6 in all and those favouring to event B are cards 


numbered 7 and 14, i.e., 2 in all. Therefore, 
= 6 == 2 
PAHs) and P(B) 20 


and B are mutually exclusive, there- 


Since, the two events A 
hat is the card 


fore, the probability that one of them happens (t 
is divisible by 3 or 7) is given by 


P(A or B)=P(A)+ P(B) 
6 2 


7720:33 207 
8 

= = 0.40. 
20 ? 


e of probability (when events are 


75.2. Multiplicative rul 
independent) 
independent events occurring 


The probability of two or more 1 
together is the product of the probabilities of the individual 


events. 
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Symbolically, if P(A) and Р(В) are the respective proba- 
bilities of happening of two independent events A and B, 
the probability that the two events will ha 
pound probability) is given by 


then 
ppen together (com- 


P(A and B)—P(A). P(B) (7.9) 


The rule can be extended to any number of independent 
events A, B, С,.... М as under— 


P(A and B and C. . . ., and M)=P(A). P(B). P(C)... P(M) 
(7.10) 


The rule will become more clear from 


examples. 
Example 7.11. Two dice are rolled. 
getting a number 5 on each dice. 


the following 
Find the probability of 


Solution 


Here, we want each dice turn up 5. Let A be the event th 
number 5 turns up on first dice and B, the 
on second dice. Then 


at 
event that 5 turns up 


l 1 

=—, P(B)= 
P(A) 6 (B) 6 
Since, the two events are independent, the probability that 
both occur simultaneously is 


/ 


P(A and B)— P(A). P(B) 


Example 7.12. A bag contains 3 w 
balls are drawn one by one. 


drawings result in white balls, 
replaced before drawing the se 


hite and 4 red balls. Two 
Find the probability that both 


in case the first ball drawn was 
cond one. 


Solution 


Let A be the event that first drawn ball is white. 
B be the event that Second drawn ball is white, 
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Since, we replace the first ball drawn to the bag, the events 
A and B are independent. Also, since 


Р(А)= 3 and P(B)= 5 


Therefore, 
P(A and B)=P(A). P(B) 
3 3 


BM 
E ot т а? 


Example 7.13. There are three notorious students in a class 
and each has a probability of 0.2 of being absent on a given 
day. Assuming absence of one is independent of the absence of 
another, what is the probability that all three will be absent on 
the same day? 


Solution 
Let A, B and C be the events that the respective boys are 
absent. Then, given that 


P(A)—0.2, P(B)=0.2 and P(C)—0.2 


Since A, B and C are independent events, therefore, the pro- 
bability of their simultaneous occurrence will be 


P(A and B and C)=P(A). P(B). P(C) 
—0.2x0.2x0.2 
=0.008. 


Example 7.14. A problem in Statistics is given to three 
students A, B and C whose chances of solving it are 3, } and } 
respectively. What is the probability that the problem will be 
solved. 


Solution 
The problem will be solved if at least one of them is 
successful in solving it. In such cases we first consider, 


(i) the probability that A fails to solve the problem 
p(A)=1—P(A)=1—4=4 [using formula (7.5)] 
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(ii) the probability that B fails to solve the problem 
P(B)—1—P(B)-1—-1—2 

(iii) the probability that C fails to solve the problem 
P(C)=1—P(C)=1—}=3 


Since, the three events in (i), (ii) and (iii) are independent, 
therefore the probability of their simultaneous occurrence, i.e., 
the probability that A, B and C fail to solve the problem, 
will be 


P(A and B and G)=P(A) . P(B) . P(C) 
—1.2,83 


ЖЕ: 
—1-0.25 


Therefore, the probability that at least one among A, B. and 
C will solve it (all three do not fail to solve the problem), 


—]— P(À and в and C) 
—1—0.25—0.75 


7.5.2 (a) Multiplicative Rule (when events are dependent) 

Before coming to multiplicative rule for dependent events, 
it is essential to know about the concept of conditional 
probability. 


71.5.2 (b) Conditional Probability 

If the events A and B are dependent so that the probability 
of occurrence of B is affected.by the occurrence of A. Then 
the probability of an event B occurring when it is known that 
ап event A has occurred is called the conditional probability 
and is denoted by P(B/A) The term P(B/A) may be read 
“The Probability of occurrence of B, given. that A has 
already occurred. 

Now, the Probability that both dependent events A and B 
occur in that order is the probability that A occurs multiplied 
by the conditional Probability that B occurs given that A has 
already occurred. Symbolically, this 


multiplicative rule can be 
put as 


P(A and B)=P(A). P(B/A) (7.11) 
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Example 7.15. A bag contains 4 red and 4 blue balls. Two 
balls are drawn one by one. Find the probability that the 
first drawing gives red ball and the second a blue ball if the 
first ball drawn is not returned to the bag before second 


drawing. 


Solution 
Let, A denote the event that the ball in first drawing is 


red and B denote the event that the ball in second drawing 
is blue. 

Since, the first ball drawn is not returned to the bag before 
the second is drawn, the probability of occurrence of event B is 
affected by the occurrence of A, i.e., two events are dependent. 
Now for calculating the simultaneous occurrence of bothevents» 
we first find 


P(A)=- = Y [using formula (7.1)] 


After drawing a red ball (occurrence of event A), the ball 
is not replaced and, therefore, at the time of second drawing 
the bag contains 3 red and 4 blue balls, i.e., 7 balls in all. 

Therefore, the conditional probability of drawing a blue 
ball (event B) when a red ball (event A) has already been drawn 
will be 


Р (В/А)-- + 


Thus, using formula (7.11), the probability that A and B 
occur simultaneously in that order is given by 


P(A and B)=P(A) . P(B/A) 
ee NEN ae 
‘aa Cae 
Example 7.16. Suppose two of the 10 students in a class 
are A and B, what is the probability of selecting the two 


students A and B at random ? 
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Solution 


The two students A and B can be selected at random in the 
following two mutually exclusive ways. 


(1) A is selected first and B in the next draw. 
(2) Bis selected first and A in the next draw. 


Now, for calculating the probability of event (1), let A 
stand for the event that student A is selected first and B for 
the event that B is selected next. 


Then, 
PA)= Û, 


Again, the conditional probability of selecting student B, 


given that A has been selected (now there are nine students left 
in the class), is 


P(B/A) — 1- 


Therefore, the probability of the two events occurring 
together i.e., “А is selected first, then B" is 


P(A and В)--Р(А) P(B/A) 
uio y 
Log 

9 


= 


Similarly, the probability of the event (2) can be calculated 


90 


90 90 45 
(using additive rule of probability as the two events 
are mutually cxclusive). 
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Example 7.17. Of all students, 25 per cent were put in 
grade C. Of these students in grade C, 30 per cent are females. 
What is the probability that a randomly selected student is a 
female student having grade C ? 


Solution 
Let A be the event that the selected student receives grade 
C and B be the event that the selected student is female. 
Therefore, the event of a female having grade C is the 
compound event of A and В. Given that 


P(A)=0.25 (25 per cent) 
P(B/A)=0.30 (30 per cent) 
Hence, P(A and B)=P(A) . P(B/A) 
=0.25 x 0.30 
=0;075: 


Example 7.18. Three groups of children contain respectively 
3 girls and | boy; 2 girls and 2 boys; 1 girl and 3 boys. One 
child is selected at random from each group. Find the pro- 
bability that the three selected children include 1 girl and 
2 boys. 


Solution 
As per given conditions, 1 girl and 2 boys may be selected 
in the following three mutually exclusive events A, B and C. 


(i) Event A—girl from Ist group and boys from 2nd and 
3rd groups. 

(8) Event B—girl from 2nd group and boys from Ist and 
3rd groups. 

(iii) Event C—girl from 3rd group and boys from Ist and 
2nd groups, 


Further, we observe that each of these events is itself a 
compound event of three simple independent events. For 
example, occurrence of event A includes the simultaneous 
selection of a girl from Ist group, a boy from 2nd group and a 
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boy from 3rd group. Thus, the probability of event A is the 
multiplication of these events, i.e., 


9 
P(A)= +x ДЕ не == 27 


4 ~ зе 32 
Similarly, 
Eek he сан. 3 
Чаты a a 
1 2158 dT 1 
O эсэр эрлээ 


Since the three events A, B and C are mutually exclusive, 


therefore the probability that any one of them happens is 
given by 


P(A or B or C)—P(A)--P(B)--P(C) 


29:48:34 md 
—332 3 t3 
23 
32' 


7.6 Random variable and probability distribution 

A random variable is a numerical quantity determined by 
the outcome of a random experiment. Tn a random experiment, 
every outcome has a probability and this probability can be 
assigned to each value of the random variable. For example, 
if x isa random variable Showing the numerical value of the 


outcome in an experiment of rolling a six faced dice. Then x 
has six possible outcomes 1, 2, 3, 4, 5 or 6 and to each outcome 


there is an associated probability T as shown in Table 7.1. 


TABLE 7.1 
Probability distribution of the numbers when 
a dice is tossed 


aoe - 30 # + «Еш 
аа аго 


” 
Probability of х=р(х) 
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Such a table listing all possible values (outcomes) of a 
random variable x together with corresponding probabilities 
P(x) is called a probability distribution of x. It is to be noted that 
the sum of all probabilities in a probability distribution is 1, e, 


6 

dox‏ ا 
rost g teet = 1.0‏ > 
x=1‏ 


А random variable is either discrete or continuous, The 
value which a discrete random variable takes are fixed numbers 
such as 1, 2, 3,... etc. or 1/2, 1, 3/2,..., which are generally 
separated from one another by a finite quantity. On the other 
hand, a continuous variable can theoretically assume all values 
within an interval. The distributions of discrete and continuous 
random variables are accordingly called discrete and continuous 
probability distributions. 


7.7 Theoretical probability distributions 

In chapter 2, we have discussed observed frequency distribu- 
tions which are results or outcomes of actual observations and 
experimentations. For example, Table 2.8 represents the 
observed frequency distribution of scores of 50 students in a 
statistics paper. Similarly, we may classify data related to 
height and weight structure of the students in the form of an 
Observed frequency distribution. 

In theoretical probability distributions instead of observed 
scores there are all possible values of a random variable and 
the frequencies are replaced by actual probabilities, which 
depend on the nature of a random variable. For example, 
in Table 7.1, the probability for every outcome (occurrence of 
1, 2, 3, 4, 5 or 6) is given as 1/6 which we calculate on the 
basis of a theoretical consideration that all the outcomes are 
equally likely. In the case of tossing two coins simultaneously, 
the random variable x, representing the number of heads, 
assumes the values 0, 1 or 2 (no head, one head or 2 heads). 
Now, since, the theoretical probability of head and tail is 
equal, і.е., P (Head)=4=P (Tail) the respective theoretical 


180 Elementary Statistics in Psychology and Education 


probabilities for each value of the variable (outcome) may be 
obtained as given in Table 7.2. 


TABLE 7.2 


Probability distribution of the number of heads 
when a pair of coins is tossed 


Outcome Jut HT or TH HH Total 
| d 871 == sss 
x 0 1 2 
-e ча шээж. 
P(x) 1/4 12 1/4 10 


Here, in Table 7.2, the probabilities for various outcomes 
x=0, 1 or 2 have been calculated as 


P(x=0)=P(TT)=P(T) P(T)=}. i-i 
р(х--1)--Р(НТ or ТН)--Р(НТ)--Р(ТН) 
—P(H) Р(Т)--Р(Т) Р(Н) 

=н = 
p(x—2)— P(HH) —P(H) Р(Н) 
= 1= 


Now suppose the above experiment of tossing 2 coins is 
repeated N=200 times, then the expected or theoretical 


frequencies for x, denoted by f.(x), is obtained by using the 
relation 


f-(x)=N р(х) (7.12) 


These expected or theoretical frequencies for different values 
of x=0, 1 and 2 are given in Table 7.3. 

However, if the same experiment of tossing 2 coins is 
empirically separated 200 times, then the observed frequency 


distribution of the number of heads might occur as shown in 
Table 7,4, 
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TABLE 7.3 


Theoretical frequency distribution ot the number 
of heads when a pair of coins is tossed 200 times 


Outcome or No. of 
Heads (x) p(x) fe(X)--N p(x) 
0 1/4 50 
1 1/2 100 
29 1/4 50 
1.00 N=200 
TABLE 7.4 


Observed frequency distribution of the number of 
heads when a pair of coins is tossed 200 times 


No. of Heads Observed frequency 
х > fo (x) 
0 47 
1 101 
2 52 


N=200 
МЕ. i See Se mor ee = 2=. 
Comparing the theoretical and observed frequency distribu- 
tions ш Table 7.3 and 7.4 repectively, we observe that theoretical 
frequency distributions may not fully agree with the observed 


or empirical distributions yet it is very likely that the observed 
distributions would become closer and closer to the theoretical 
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distributions when the number 
large. In view of this, 
distributions can serve as 


of repeating an experiment is 
theoretical Probability or frequency 
models for representing an observed 
probability ог frequency distribution. Binomial probability 
distribution, discussed below, is one such probability distribu- 
tion, which is commonly used in statistics, 


7.8 Binomial probability distribution 
Suppose there is an 

independent trials, each wi 

Failure", For example. 


two possible outcomes Which may be 
failure. Similarly, suppose 


in trial remains constant. In the 
first case, it is + and in the second case itis}. Such type of 
Tepeated independent trials are called the Bernoulli trials, if 
they have the follo 


() there are only two 
failure 

(1) the prob 
trials. 


possible Outcomes, ‘success’ or 


ability of success remains constant in all the 


Binomial probability distribution has been developed to find 
the probability of x successes іп п Bernoulli's trials. First, in 
a trial, let us consider the occu 
and non-occurrence a ‘failure’, 


‘success’, and q—1— p be the probability of failure in a single 
trial. The number of Successes x, in n independent trials would 


be either 0 or 1 or 2 0r3,... or n—l or n. Obviously, x is a 
random variable Which can 
Sach n. 


тгепсе of an event a ‘success’ 
Let p be the probability of 
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the specified order can be obtained by using the multiplicative 
rule of probability (trials being independent) as under 
Р.р.р....р. 9-4-9. · .: 4 
x times (n—x) times 
зарт gly 


But we are interested in any x trials resulting in success and 
these x trials may be chosen out of n in ncs mutually exclusive 
ways of ordering them. Thus we add the probabilities of all 
these cases or simply multiply р“ q=) by nc« to get the 
general formula for the probability of x successes in n indepen- 
dent trials as 


р(х) =псх рт 407) (7.13) 
where x—07;1,2; ....n. 


Now, for each value of the random variable x, the corres- 
'ponding probabilities may be obtained by using formula (7.13) 
and has been tabulated in Table 7.5. 


TABLE 7.5 
Binomial probability distribution 
No. of Successes Probability Function 
4 x p(x)—nexp*q"n^* 
oe ss 

0 q" 

1 ne, рат-1 

2 nea p'q"-* 

x nexp*q"— > 

E ME" 


(q+p) =1 


гтэ ee eee 
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The probability distribution so obtained and given in 
Table 7.5 is called binomial probability distribution . 
we note that 


Here also 


n 
БУ РС) —p(0) - p(1)2- .. . .-- p(n) 
x=0 
= рода" -+ nci pq’ !+ “л nc, р" q? 
=(q+p)" 
2 = as (а+р)=1.0 


In the above discussion, it has been Observed that formula 
for p(x) in (7.13) represents all the probabilities of a random 
variable x. Obviously, this would necessarily be a function of 


X and is called the probability function of a discrete random | 
variable х.” 


7.9. Properties of binomial Probability distribution 

(i) the binomial probability distribution has two parameters? 
n and p. A binomial Probability distribution is 
completely known if the values of папа p are specified, 

(1) the mean of the binomial distribution is np. 

(iii) the S.D. and the variance of the binomial distribution 
are Упра and npq respectively. 

(iv) for p=q=4, the binomial distri 
distribution. | 

(у) for реда, the binomial distribution is а skewed 
distribution. 

(vi) when n, the number of trials 
P and q are not very sm 


bution is a. symmetrical 


are sufficiently large and 
all, the discrete binomial 


of the term (q--p) corres- 
expansion of the term (q tp)" is UP кы 
(a p'—nc, pq" +n, Pq" nc, p? gr-2 
сараа а 
=PO)+P(1)+p(2)+ .. +p(n) дый. 


The constant quantities 1 dh ; 
п à probability distributi : 
parameters, y tribution are called its 


Probability and Binomial Probability Distribution 185 


distribution tends to a continuous normal probability 
distribution which is the subject matter of the next 
chapter. 


7.10. Binomial frequency distribution 

In tossing a coin, the probability of its turning hcad or tail 
is}. Now if this experiment of tossing is repeated, say 50 
times then, on theoretical grounds, we expect 25 heads and 
25 tails, i.e., the expected frequency for head as well as for 
tail will be 25 in 50 repeated experiments. The expected 
frequency of an outcome is, thus, obtained by multiplying the 
probability of an outcome with the number of times an 
experiment is repeated. 1 

In case of binomial probability distribution, let us suppose 
that n trials constitute an experiment and if this experiment is 
repeated N times, the expected frequency of x successes is, then, 


given by 
f(x)=N p(x)=N nex р“ 9" ~ (7.14) 


Here, x=0, 1, 2,... 0. 
f(x)=the expected frequency of x successes in N 
repeated experiments. 


Varying x from 0 to n, the expected bionomial frequency 
distribution f(x) is obtained as shown in Tabie 7.6. 

Example 7.19. A basket ball player hits 75 per cent of his 
shoots from the free throw line. What is the probability that 
he shoots exactly twice in his next 5 free shoots ? 


Solution 

Let us suppose that player’s hitting his shoots is a ‘success’. 
Then we are given that probability of success p=0.75. The 
player makes п=5 independent free shoots. Therefore, the 
probability of exactly two successes is obtained by using 


binomial probability distribution given in (7.13), i.e., 


р(х) = nc: р“ q^* 
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TABLE 7.6 


Expected (Theoretical) Binomial Frequency 
Distribution 


No. of Successes Probability Function Expected | Frequency 
x р(х)=псхрхап—х Distribution 
f(x)=N. p(x) 
کے‎ À 
0 q" N. q" 
1 Ney pqz—1 N. Ney рат—1 
2 De, рдп = N. ne, р^дт—* 
x Ney рхал— х N. Ney p*q -* 
n p N. pa 
| ee өл 
Total (qtpy=1 N. (9 +р)л =N 


In our case, п--5, P=.75, q=1—p=0.25 and х--2, Putting 
the values, the required probability will be 


P(2)= 5c, (.75) (.25)3 
=.0879, 


Student will pass in 
What is the Probability that 
of the 8 students will get success in the examination ? 


Solution 


Let passing in the final examination be a Success. Therefore, 
P=0.9 and q—0.1. 


Also given that 1—8 and x=5 (No. of 
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trials can be obtained by using binomial probability law as 
given in formula (7.13). 


p(x)=Ncx p* 9'* 
Putting the values, 
р(5)-48с, (.9)5 (.1)3 
8х7х6х5х4 
= A .590 .001 
1х2х3х4х5 CC 
= 0.00295 


Example 7.21. An unbiased six-faced dice is thrown 4 times. 
Write down the probability distribution of x where x stands for 
the number of ‘sixes’ on the dice. 


Solution 

Here the trial of tossing is repeated 4 times, i.e., n=4. Now, 
in four trials the number of ‘sixes’ x can be 0, 1, 2, 3 or 4. Also 
the trials are independent and probability of getting six (success) 
in each trial is p— Я and q—1— i = 4 1 Finally, for 
varying values of x, the needed probability distribution of x is 
obtained by using binomial probability function in (7.13). The 
same is shown in Table 7.7. 

TABLE 7.7 


Binomial probability distribution of No. of sixes in 4 trials 


No. of sixes Pobability of getting x sixes 

х PQ) =n, p*q" л 

0 р(0)--(5/6)!--0,4823 

1 DU) 74, (1/6) (5/6)! - 0.3858 
2 р(2у--4,, (116): (5/6 = 0.1157 
з | p(3)- 4, (1/6) (5/6) -0.0154 
4 p(4) -(1/6)'—0.0008 

loce Lui |, 


Total 1.0000 
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Example 7.22. Let, the throwing of a dice 4 times in 
Example 7.21 be treated as an experiment and this experiment 


be repeated 1296 times. Find the expected frequencies of the 
number of sixes thus obtained. 


Solution 


Using formula (7.14) defining the frequency function and 
the probability distribution in Table 7.7, we can get the required 


expected binomial frequency distribution as given in Table 7.8 
below— 


TABLE 7,8 


Expected Binomial Probability Distribution 


No. of Sixes p(x) Expected Frequency 
x f(X)=N. p(x) 
— |e ЕГ 1 
0 0.4823 1296 x 4823 = 625 
1 0.3858 1296 х.3858--500 
2 0.1157 1296 x .1157—150 
3 0.0154 1296 x .0154=20 
4 0.0008 1296 > .0008 =1 
= 9а. 
Total 1.0000 М=1296 


Example 7.23. The prob 
student wili graduate is 0.4, 
out of 5 such students 


ability that 


an entering college 
Determine 


the Probability that 


(i) none will graduate. 
(1) one will graduate. 


(ii) atleast one will graduate, 
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Solution 

Let student’s doing graduation be treated as success, then 
p=.4, q=.6 and n=5. From formula (7.13), the probability 
of x successes in n trials is given by 


р(х)=псу рх q** 
where х=0, 1, 2, 3, 4, 5 


(i) In this case, the probability that none of the students 
will graduate is the probability of x=0, i.e., 
р(0) = 5с» (.4)° (.6)5— 0.0778. 
(ii) Similarly, by putting x=1 in (7.13), the required pro- - 
bability that exactly one will graduate out of 5 students 
is obtained as 


p(1)—56 (4) C9* 
—0.2592 
Gii) Finally the event that аг least one will graduate can 
happen if any one of the following mutually exclusive 


events takes place. 


(a) one will graduate, i.e., x —l. 
(b) two will graduate, ie. x—2. 
(c) three will graduate, i.e., x—3. 
(d) four will graduate, i.e., х=4. 
(e) all five will graduate, ie, X=5. 


Therefore, Symbolically 


P[at least one will graduate] 
=P[x=1 or x=2 or x=3 or х=4 ог Х--5| 
=р[х==1]-ЕР[х=2]-ЕР[х+=3]4+Р[х=4]1+-Р[х=5) 

(using additive rule of probability) 
— 5c, (.4)(6)*-- 5c: (49:6)! -5cs C4) C6? Scs (.4):(.6) 

з +5, (.4)5(.6)° 

=0,2592+40.3456-+ 0.2304 +0.0768=0.0102 
—0.9222 
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Alternatively, 


P [at least one will graduate] 

-1-Р [none will graduate] 

=1—0.0778—0.9222 

Example 7.24. Out of 1000 families with 4 children each, 
how many would 


you expect to have (a) 1 boy (b) 2 boys. 
Assume that the probability of a male birth is 4. 


Solution 
Assuming that the birth of a male child is success (x), we 
are given that 
N=1000, n=4, P=} and а=} 
(a) The Probability that a family of 4 children will have 


l boy can be obtained by using binomial probability 
distribution in (7.13), i | 


P(X)— ne, рх ах 
РА) =4е, (4) 33 —0.25 
Therefore, the expected number 
boy, i.e., f(x=1)=N P(x=1) 


=N р(1) 
= 1000 x 0,25 


=250 [using formula (7.14)] 


.е., 


of families having one 


(b) Similarly, the Probability of two boys ina family 


р(к==2) —4c, (1) (3). 
=0.375 


Therefore, the expected nu 
f(x=2)=N , p(x—2)— 
54375. 


Example 7.25. Suppose a multiple choice question having 


four choices is answered by 10 students Separately and inde- 


pendently at random, Find -the Probability that exactly 3 
students will guess it correctly, 


mber of families with tw 


o boys 
10005 0.375 
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Solution 

In this case, each guessing attempt is trial with probability 
of success p=} and that of failure q=1—p=3. Therefore, the 
probability that out of 10 students three will guess it correctly 
is given by 


p(x—3)— 106, p? qr [using formula (7.13)] 
= 10с» (.25)3 (.75)7 
10х9х8 
= .259(.75)! 
1x2x3 (21) 
= 120 x (.0156)(.1335) 
= 0.2499. 


Example 7.26. One-fifth of the students joining a coaching 
institute do not belong to the city. If the students are grouped 
at random in different batches, what is the probability that a 
batch of 8 will consist of exactly 3 out of city students? 


Solution 
Here, the probability that the students do not belong to the 
city is. P= 1 3 Дэм бе Л 
5 5 


Therefore, the probability that out of 8 students three are 
out of city can be obtained by using formula (7.13), i.e., 


p(x) =nexp*q" * 
| 
57 
236 11374 
p3)-8c, =) (+) 


5 


Putting n—8, x=3, p= , we have 


5 12$. (.008)(0.32768) 


70.1468. 


Example 7.27. Find mean; 8.0, and variance of a binomial 
probability distribution whose parameters are n= 10, p=0.4. 


Solution 
Knowing that mean, S.D. and variance of а: binomial 
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probability distribution are np, Vnpq and пра respectively. 
Therefore, in the present case 


Меап=пр=10х.4—4.0 
S.D.—4/npq —4/10x 4x.6 —1.549 
Variance—npq— 10x 4x ,6—2.4 


Example 7.28. Bring out fallacy, if any, in the following 
statement. "The mean ofa binomial probability distribution is 
5 and its standard deviation is 3" 


Solution 
For a binomial probability distribution, it is given that 


mean=np=5 Je) 
S.D.—4/npq =3 5201 
52 пра=9 From (ii) 
or 5а=9 From (i) 
: Ор ў 
а= = 1.8 


Here, the value of q comes out to be 1.8 which is impossible 
because q is the probability of failure and as such it cannot 
exceed 1. Therefore, the given statement is wrong. 


Problem Set 7 


1. Define the following terms as used in probability theory, 
(i) Trial and event, 

(ii) Equally likely events, 

(iii) Mutually exclusive events. 

(iv) Independent апа dependent events, 

(V) Simple and Compound events. 


2. Give classical and empirical definitions of probability and 
comment by mentioning their limitations, 
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3. 


State carefully the addition and multiplicative rules of 
probability. 

State the concept of conditional probability. 

Define the following: 


(i) Random variable. 
(ii) Probability distribution. 
(iii) Bernoulli trials. 


What is a binomial probability distribution? Mention its 
important properties. 

A bag contains 3 white and 2 black balls. Find the pro- 
bability of drawing a white ball. 

A card is drawn from a well-shuffled pack of 52. Find 
the probability that 


(i) the card is the ace of spade. 
(ii) the card is spade. 
(iii) the card is an ace. 


A number is chosen from each of two sets 


1,2, 3, 4, 5, 6, 7, 8, 9; and 1, 2, 3, 4, 5, 6, 7, 8,9 
If pi is the probability that sum of the two numbers be 10, 
and p: the probability that their sum be 8; Find pi+p,. 


A and B throw with two dice; if A throw 9, find B's 
chance of throwing a higher number. 

A registered company has 40 female and 60 male 
employees. If two employees are selected at random, 
what is the probability that 


(i) both will be males? 
(ii) both will be females? 
(iii) there will be one of each sex? 


What is the probability of throwing more than 3 ina 
single throw from an ordinary dice? 
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If the probability of a horse A winning a race is 1/6 and 
the probability of a horse B winning the same race is 1/4. 
What is the probability that one of these horses.will win? 


A problem in mathematics is given to three students A. B 
and C. Their chances of solving it are 1/2, 1/3 and 1/5 
respectively. What is the probability that the problem 
will be solved. 

If P(A)=0.3, P(B)=0.2 and P(C)=0.1. A, B and C are 
independent events, find the probability of occurrence of 
at least one of the three events A, B and C. 


There are 3 economists, 4 engineers, 2 statisticians and 
1 doctor. 


A committee of 4 from among them is to be 
formed. Find the probability that the committee: 
(i) consists of one of each kind (ii) has at least one 


economist (11) has the doctor as a member and three 
others. 


Two students A and B appear in an interview for two 
vacancies in the same post. The probability of A’s selec- 
tion is 1/7 and of B’s selection is 1/5. What is the pro- 
bability that ? 

(i) both of them will be slelected ? (1) only one of them 
will be selected ? (iii) none of them will be selected ? 


Problems on binomial distribution 


18. 


19. 


20. 


The probability that a student will pass in high school 


examination is 0.6. What is the probability that exactly, 


6 of the 10 such students will get success in the examina- 
tion? 


The probability that a student will be admitted 
engineering college is 0.3. 
out of 12 applicants 


in an 
What is the probability that 


(i) exactly 4 will be admitted? 
(ii) less than 3 will be admitted? 


A multiple choice question having three choices is answer- 
ed by 10 students separately and independently at random. 
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Find the probability that 


(i) exactly 4 students will guess it correctly. 


(ii) more than 7 students will guess it correctly. 


Answers 
Л: Q6 
8. (i) 1/52 (ii) 13/52 (iii) 4/52 
9. 16/81 
10: + m6 
1l. (i) 0.358 (ii) 0.158 (iii) 0.485 
12. "172 4 
13: Sie. 
14. 11/15 
15. 0.496 


16. (i) 24/210 (ii) 175/210 (iii) 84/210 
17. (i) 1/35 (ii) 10/35 (iii) 24/35 

18. 0.2508 

19. (i) 0.2311 (ii) 0.2528 

02. (i) 0.2276 (ii) 0.CO34. 


^ 
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THE NORMAL PROBABILITY DISTRIBUTION 


8.1. Introduction 


In the previous chapter, we discussed a discrete distribution 


called binomial probability distribution, which arises when 


Bernoulli’s trials are made. But in practice, we come across a 


number of biological, social, economic, industrial and psycholo- 
gical measurements where the variables are continuous in 
nature and as such can be adequately described only bya 
continuous probability distribution. One of the most impor- 
tant continuous probability distributions in the entire field of 
Statistics is the normal probability distribution. It has been 
shown that a vast number of distributions arising in studies of 
natural, social and psychological phenomena conform’ to 
normal. The graphical expression of the normal distribution, 


called the normal curve, is the bell-shaped smooth symmetrical 
curve as shown in Fig. 8.1. 


т Xx 
Fig. 8.1. The normal curve 


Often the distributions of quantitative data that we observe 


show concentration of frequencies near a central value of 
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distribution and the frequencies gradually taper off symmetri- 
‘cally on both sides. This general tendency of quantitative data, 
Гог а large number of measurements, gives rise to the sym- 
metrical bell-shaped form of a normal curve. Thus the normal 
curve or distribution is a general theoretical model which may 
be used to describe the frequency distribution of many variables. 
An example is the frequency distribution of scores in Table 2.9. 
Fig. 3.5 shows a frequency curve superimposed on histogram 
and frequency polygon of the distribution in Table 2.9. Now, 
ifthe number of students is increased and class size is reduced, 
we can see intuitively that the frequency curve will tend to 
become a bell-shaped smooth symmetrical curve as shown in 
Fig. 8.1. Similarly intelligence measured by standard tests, 
educational test scores in spelling, mathematics, reading etc. 
and measures of heights and weights for a large group of 
students are examples of psychological measurements which 
can be usually represented by normal distribution or curve. 


8.2. The equation of the normal curve 

In 1733 Demoire developed the mathematical equation of 
the normal curve. Later Gauss (1777-1855) derived its equation 
from the study of errors in repeated measurements of: the same 
variable. As a result, the normal distribution is also called 
Gaussion distribution in honour of Gauss. 

The mathematical equation for the probability distribution of 
the continuous normal variable x with meun p and variance c? is 
given by 


1 -(xz Jy 
224 


- О: , — 0 <Х < 00 8.1) 
f(x, и, c) ale со (6 
where x=continuous normal variable. 
p=mean of the normal distribution. 
o=standard deviation of the normal distribution. 
7— 3.14159 approx. 
e=2.71828 approx. 


198 Elementary Statistics in Psychology and Education 


p. and о are the two parameters of the normal distribution 
or curve. Опсе p and с are specified, the shape of the normal 
curve is completely determined. 

The shapes of some normal curves for different values of p 


(say ка and ро) and 5.О.--6(зау c, and сз) are shown in Fig. 
8.2, 8.3 and 8.4. 


Fig. 8.2. Two normal curves with ду # дэ and 541—623. 


Fig. 8.2 shows the graphs of two normal curves with 
unequal means but equal standard deviations. 


LR m 


Fig. 8.3. Two normal curves with It1— tg and o s, 
H + 


In Fig. 8.3, we have two norma 
but different standard deviations. 
centred over the same point, 
standard deviation is higher a 

Fig. 8.4 shows the sket 
different means and standard 


l curves. with: equal means 
Here the two curves are 
but the curve with the smaller 
nd narrower in Tange. 


ches of two normal ` curves with 
deviations,’ 2 
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— 


Vig. 8.4. Two normal curves with i 4 до and а а 


8.3. Properties of normal distribution or curve 
Some of the important properties of normal distribution or 
curve may be listed as follows— 


i 


N 


The curve is symmetrical about a vertical axis through 
the mean p, i.e., if we fold the curve along this vertical 
axis, the two halves of the curve would coincide. 

As a result of symmetry, the mean, median and mode of 
the distribution are identical. 

Since there is only one maximum point in the curve, 
the normal curve is unimodal, i.e., it has only one 
mode. 

The height of the curve declines symmetrically in either 
direction from the maximum point. Hence the ordinates 
for values of x=p £k, where k is a real number, are 
equal. For example, the heights of the curve or the 
ordinates at x=p-++o and х=н—с are exacily the same 
as shown in the following Fig. 8.5. 


‚н-т p’ B pte ENT 
Fig. 8.5. Ordinates of a normal curve, 
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The normal curve approaches the horizontal axis 
asymptotically, i.e., the curve continues to decrease in 
height on both ends away from the mean but never 
touches the horizontal axis. Therefore, the random 
normal variable extends from —« to -re, ie, 
— 00 < X< o. 
The total arca under the normal curve and above the 
horizontal axis is 1,0 which is essential for a probability 
distribution or curve. 

Since the shape of the normal curve is completely 
determined with its parameters и and а, the area under 
the curve bounded by the two ordinates also depends 
on these parameters. Some important areas under the 
curve bounded by the ordinates at c, 26 and 3с distances 


away from mean in either direction are shown in 
Fig. 8.6, 8.7 and 8.8 respectively. 


Ber НЕЕ 
Fig. 8.6. Area between 1—sigma limits (68.27%). 


Fig.8.7. Area between 2—sigma limits (95.45%), 
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H—30 FS р H+30 


Fig. 8.8. Arca between 3—sigma limits (99.73%). 


That is, 

(i) the area between ordinates at х--и--с and x=p+o is 
0.6827 ог 68.27%. 

(ii) the area between ordinates at х=/—20 and х=и+2а 
is 0.9545 ог 95.45%. 

(iii) the area between ordinates at X—pu—3se and x=p +30 
is 0.9973, i.e., the aréa under the normal curve beyond 
these ordinates is only 1—0.9973—0.,027 which is very 
small. Thus, practically the whole area under the 
normal curve lies within limits „ + Зс which are also 
called 3-sigma limits. = 


8.4. Area or probability under the normal curve 

The area or probability under a normal probability distribu- 
tion or curve bounded by two ordinates x—a and x—b is 
symbolically written down as— 


P(a<x<b) (8.2) 


This is also the probability that a normally distributed 
variable x lies between two specified values a and b. Graphi- 
cally, the probability in (8.2) is represented Фу shaded area in 
Fig. 8.9. 


8.5. Standard normal variable 

Since we know that the area under a normal curve between 
any two ordinates depends upon the values of its parameters 7 
and c. Obviously, the area or the probability lying between 
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|Х=а | x=b 
Fig. 8.9. P(a<x<b)=Shaded area. 


two specified ordinates too will change with the values of и and 
с. As such, it will be a very difficult task to calculate the area 
between two ordinates for every conceivable value of p and с. 
Fortunately it is possible to transform any normal random 
variable x with mean p and variance c? to a new normal 
random variable z with mean 0 and variance 1, This normal 
random variable z with. mean 0 and variance 1 is called 


standard normal variable. The transformation ofxtozis done 
by using the relationship 


z= Dh (8.3) 
G 


The area or probability under the standard normal distribu- 
tion or curve between the ordinates at mean 0 and z has been 


given in Table A (At the end of the book) for values of z from 
0 to 4.9. Thus Јаре A gives area being measured from mean 
to some positive value of z, i.e., the shaded area in Fig. 8.10. 


2-0 2 
Fig. 8.10. Area under standard normal curve аз tabulated in Table 
A for values of Z from 0 to 4.9. 
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It is now clear from the above discussion that we can study 
the area under a normal curve between two ordinates only by 
transforming the given variablé to a standard normal variable. 
The procedure will involve the following steps — 


(i) Using the transformation in (8.3) convert the given 
random normal variable x with mean и and variance c? 
to a standard normal variable z. 

(ii) Now the area lying between ordinates x=a and x=b 
will correspond to the area under standard normal 
curve between the ordinates کر‎ and age = . 

o o 
In other words, the probability that x falls between a 
and b, i.e., p(a<x<b) will correspond to the probability 
bu ; 
1.6, 


that z falls between z,— ÊÊ and z= y 
с с 


реа Би 1 This correspondence in pro- 
tie с 


bability is sketched in Fig. 8.11 below. 


Fig. 8.11. Transformation of x to z. 


Mathematically also, it is very easy to see that 


Р [a<x<b]=P [ — „а e] 


G G 


[Using the transformation in 8.3] 
=P [z,<z<z]. 
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(iii) Finally, use Table A to get the required area or proba- 
bility, i.e., 


Р E cic EP 
с с 
for known values од, c, a and b. 


8.6. Using Table A 
While using Table A for calculating area under a standard 


normal curve, the following point should be taken into con- 
sideration. 


(i) the total area under a standard normal curve is 1. 
(ii) the curve is symmetrical and, therefore, 


Plo<z<a]=P[—a<z<o] 


IN 


—— 4 


х--а 0 х== 
Fig. 8.12. P(—a<z<o0)=P(o<z<a). 


(iii) mean of the distribution is 0, as such, the negative and 
Positive values of z will fall on the left and right of 
mean respectively. 

(iv) the ordinate at 2—0 divides the area under standard 


normal curve into two equal parts, The following 
examples will further clarify the above discussions and 
the use of Table A. 


Example 8.1. x is a normal random variable with mean 
1.25 and 5-5, find the values of Zi and za such that 
P(20-x —30)— P(z, «z«2) 
Here z is a standard normal variable. 
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Solution 
For transforming the given normal variable x to a standard 
normal variable 2, we use the relationship in (8.3), i.e., 
z= X—p 
Therefore, for the values x=20 and x=30 of the variable x, 
the corresponding values of z variable will be 


хоолын. A (for x—20) 


and z= =+1 (for x=25) 


since p and c are given as 25 and 5 respectively. Therefore, 
P[20<x <30]=P[—1<z<1] 


The transformation is clarified graphically also in Fig. 8.'3 
, below. 


| 
| 
| 
| 
| 
| 
| 
| 


х=20 х=25 х=30 


2,2::1. Жаб. Z1 


Fig. 8.13. Transformation from x to z. 


Example 8.2. z is a standard normal variable. Use Table A 
to determine the following probabilities— 


(i) PO<z<2) 
(ii) P(—1<z<0) 
(iii) P(—2<2<2) 
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Solution 
Since z is a standard normal variable, as such, Table A can 
be used for determining the probabilities, 


(i) PO<z<2) 
Here the required area or Probability is between 
ordinates at mean 2--0 and z=2 and, therefore, for 
z—2.0 the probability is directly obtained from Table A, 
аз Р(0-22-:2)--0.4750 

(ii) P(—1«z«0) . 
Remember that in Table A we are given area or proba- 
bility between ordinates at mean z=0 and z where z is 
Some positive value. For determining probabilities, 
other than directly given in Table A, we,as a rule, 
Should always make use of symmetrical properties of a 
normal curve to convert the required area in the form 
as given in Table А. This table is then used to know 
the required probability. Therefore, 


P(—1«z«0)—P(0—z—1) [Normal curve being 


symmetrical] 


70.3413 [for z—1 from Table A] 
(iii) P(—2<2<2)=P(—2<2.<0) +P(0<z<2) 

=P(0<z<2)+P(0<z<2) 

=2.P(0<z<2) 

= 2(0.4750) [For z—2 from Table A] 

=0.95 


The above calculatio 


ns become clear from the following 
sketch in Fig, 8,14, Ч 


Fig. 8.14. Shaded area is P(—2<z<2). 


The Normal Probability Distribution 207 


Example 8.3. Zis a standard normal variable. Find the 
following probabilities— 
(i) P(—2<z<1) 
(1) РС—со<2<—1) 
(iii) Р(2<2< о) 
(iv) Р(—о<2<1) 


Solution 
(i) Here, 

P(—2<z<1)=p(—2<z<0)+p(0<z<!) 
=p(0<z<2)+p(0<z<1) 
=0.4772+0.3413 [From Table A for 
=0.8185 2=2.0 and 2=1.0] 


The probability is represented by the shaded area in Fig. 
8.15. 


=2 0 1 
Fig. 8.15. Shaded area is Р(-2«:41). 
Gi) P(—% <= —1)=Р(— e«z«0)—PC—1 <z<0) 
=P(0<z< o)—P(0-z— 1) 
=0.5000—0.3413 [From Table A] 
=0/1557 А 


This probability is represented by shaded area in Fig. 8.16. 


2 -1 021 
Fig. 8.16. Shaded area is Р(--со «z«-1) 
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` бр) Pe<z< o)=P(0<z<00)—P(0<2<2) 
=0.5000—0.4772 (From Table A) 

—0.0728 


The shaded area in Fig. 8.17 shows the above probability. 


SoZ 
Z=0 2 


Fig. 8.17. Shaded area is P(2<z< x), 


(iv) P(— co <2<1)=Р(— 99 <2<0)+Р(0<7<1) 
=Р(0<7< со) Р(0<2<1) 
--0.5000--0,3413 (From Table A) 
=0.8413 


The shaded area in Fig. 8.18 Tepresents the above Probability. 


=0 Z= 
Fig. 8.18. Shaded area is Р(—= «z« 1) 


Example 8.4. Find the following Probabilities 


(i) P(1.1<2<2.1) 
(ii) P(—2.0<z<_} 2) 


2 is a standard normal variable, 
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Solution 
(i) РИ.1<=2<2.1)=Р(0<2<-2.1) —Р(0<2<1.1) 
==0.4821—0.3643 (From Т 
на Г om Table A) 


The shaded area in Fig. 8.19 represents the above probability 


22 = 
= LE KES. 2 
9 11^ 2,1 


Fig. 8.19. Shaded area is p (1.1<z<2.1). 


(i) P(—2.0<z< ~1.2)=P(—2.0<z<0)—P(—1.2<z<0) 
=P(0<z<2.0)—P(0<z<1.2) 
=0.4772—0,3849 [From Table A] 
-0.0923 


The sha area in Fig i 
pnm haded area in Fig. 8.20 represents the required pro- 


БЕП 


2 . 
Fig. 8.20 The shaded area is p (—2.0<z<—1.2). 

a class is assumed to be normally 
100 and standard deviation с= 10. 
pect to have І.О. between 


Example 8.5. 1.О. in 
distributed with mean p= 
What proportion of students do you ex 


100 and 110? 
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Solution 
Let x denote the variable T. 


Q. Then x is given to be a 
normally distributed variable w 


ith „=100 and =10. Here we 
are required to find the Proportion, area or probability 
P(100—x — 110). i.e., the probability of x lying between 100 and 
110. For calculating this probability we first transform the 
variable x to z by using the transformation in 8.3. Therefore, 


шоев Dcus Ec 3010р, 
5 с с 
ру 100—100 . . 110.100 N 
= dS та 
=P(0<z<1) 


=0.3413 (From Table A) 


Thus, the Tequired proportion of Students is 0.3413 or 


34.1397. Transformation and expression of probability are 
clarified graphically in Fig. 8,21. . 


x=1.0, 
x=100 x=110 


Z=0 Z=1 

Fig. 8.21, Shaded area is P (100<x<110) ог p (0cz« 1). 
Example 8.6, 10! 
Distribution of sc 
ш=30 and 66.25, 
Score i 


00 students appeared in àn examination. 
Ores is assumed to be normal with mean 
How many students are expected to get 


(i) between 20 484 40. 
(1) Less than 35. 
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Solution 
Thus x is a normal 


Let x stand for the variable score. 
variable with mean и=30 and S.D. o=6.25. 


(i) In the first case, we need to calculate the proportion 
or probability of x lying between 20 and 40, i.e., 
20- -— = 
P(20<x<40)—P 20-8 -Х-8..40--8 1 
G G o 
[using transformation in (8.3) ] 
__ъГ20--30 ..,.. 40—301 


6.25 ~~ 76.25: - 
=P[—1.60<z< 1.60] 
=P[--1.60<z<0]4+P[0<z<1.60] 
==2. Р|0-:2-11.60| 
= 20.4452 (From Table A for z— 1.60) 


= 0.8904 


Therefore, the expected number of students (frequency) 
scoring between 20 and 40 
= 1000  P(20 — x — 40) 
= 1060 ~ 0.8904 
= 890 students. 
Used transformations and expressions will be more clear 


from Fig. 8.22. 


х=20 x=30 х=40 
| Y x-30 


!z22.60"2-0 Z=160 7727556 


Fig. 8.22. Shaded area is p (20<x<40) or p (—1.6«z«1.6). 
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(iD In this case we are required to calculate the proportion 
of students scoring below 35, Le; 


e(x<35)=p(— се <х <35) 
=p оно koe З5= 4 
pu rc ec 
[Using the transformation in (8.3)] 
— го —30 35—303 
= EU v чы ЖЫ с 
4: 8852 Еэ 
=р(— <z<0.80) 
=p(— o <Z<0)+¢(0<z<0.80) 


=(0<z<0)+.(0<z<0.80) 
=0.5000+- 0.2881 (From Table A) 
=0.7881 


Therefore, the expected number of students (Expected 
frequency) Scoring less than 35 will be 


= 1000 x p(x <35) 
=1000 x 0.7881 
=788 students. 


Fig. 8.23 will clarify the transformation and expressions. 


feu» 2-0 2-08 27.30 
3 625 
ig. 23. Shaded area is p (x<35) or P (z<0.80) 
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Solution 
In transforming the variable from x to 2, the relationship 


in (8.3) is used which reads 


The same relationship can also be used as a transformation 
from z to x variable (p & a being known) in the following 


manner 
x=p+oz SON 


Further, in the previous example, we were concerned with 
area or probability bounded by ordinates at different values of 
z, but the present problem is just the reverse. Here we have 
to find the centrally located limits, say —а to a, within which 
70 per cent or 0.70 area (0.35 on each side of mean) is covered 


as shown in Fig. 8.24. 


30 х-365 х-р4 2 
Fig. 8.24. Shaded area is central 0.70. 


ed with a standard normal 


Since Table A is only concern 
8 first used to determine such 


variable z, therefore, the Table i 
limits on a z--scale. Symbolically, we need 


p(—a<z<a)=0.70 
or o( a <z<0)-+9(0<z<a)=0.70 
or e(0<z< а) +e(0<z<a)=0.70 
ог 2g(0«z«a) =0.70 


or o(0<z<a) =0:35 ott 
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Now, we go through the Tabie A and locate a value of 
2==а corresponding to the area 0.35 as obtained in equation (ii). 
Such a value is a=1.04. Thus. the centrally located limits on 
z— scale covering 0.70 area ог probability are --1.04 to 1.04 
(See Fig. 8.24). 


Finally, the two limits can be transformed to x scale by 
using the relationship in equation (i). i.e., 


X— ро z, where ш 30 and == 6.25 (Given). 


Putting z= — 1.04, the corr 
becomes 


esponding lower limit on x scale 
х--30--(6.25х(-1.04)| 

=30 —6.50 

—23.5—:24 in whole numbers, 


Similarly, 


for z—1.04, the upper limit on x scale is obtained 
as 


x-:30--(6.25 x 1.04) 
=36.50 
=37 in whole numbers. 


Thus the central limits of scor 
of the students score are 24 to 37, 

Example 8.8. In an examination the sc 
normally distributed with mean p= 
18 per cent of the students are in Ist div 
off point for a first division ? 


es within which 70 per cent 


Ores are found to be 
70 and S.D, G-8, If 
ision, what is the cut- 


Solution 


In this case also we proceed 
from a known area or probabilit 
determined and then the same 
the relationship 


like example 8.7. Here also, 
у, а value on z scale is to be 
is transformed on x Scale using 


X=p-+oz 
=70+8 z 
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An area of 18 per cent or 0.18, corresponding to the pro- 
portion of students receiving Ist division, is shaded in Fig. 8.25. 


x=70 x=p+ cz 
Fig. 8.25. Shaded area is p (a<z< ®). 


According to given example, a value of z=a on z scale is 
needed such that 


о(а<2< оо )=0.18 


ог e(0<z<00)—p(0<z<a)=0.18 


or „ 9.5000 —(0<z<a)=0.18 
or e(0<z<a)=0.50—0.18 
=0.32 Таи GH) 


Thus in equation (ii) we have converted the required area 
under normal curve in the form of tabulated area in Table A. 
Then, by using Table A, the value of z=a corresponding to 
area 0.32 is 2=4=0.92. 

This lowest cut-off point a=0.92 for Ist division score on 
z—scale is then transformed to x scale by using the relationship 
in equation (i), i.e., 


х=70+8 2 
=70+8 x 0.92 (putting z—0.92) 
=70+7.36 
=77.36=78 in whole numbers. 


Thus, a score of 78 is the lowest for securing Ist division 
in the said examination. 
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8.7. The Normal Approximation to Binomial Distribution 

In case the number of trials n is very large, it becomes very 
difficult to calculate probabilities by using binomial probability 
rule (7.13). Fortunately, under certain conditions, the discrete 


binomial distribution also approaches the continuous normal 
distribution and as such facilitates calculations. 


If n is large and neither р пог а is close to zero, it turns 
out that the binomial distribution can be closely approximated 
by a normal distribution. The Approximate is very accurate when 
n is large and fairly 8ood for small values of n if p is close 
to $. Under these conditions a binomially distributed variable 


X tends to be normally distributed with mean ш=пр and S.D. 
o=/npq- Therefore, the standardized variable 


Х—и X—np 
Ze =———. 
б У пра 


will Бе a standard normal variable with mean 0 and S.D. 1. 
Following examples will show how this 


Example 8.9. Find the probability 
the normal approximation to binomial. 


approximation works. 
in example 7.24 using 


Solution 


Here n— 10, p=.25, 4=.75 and the exact prob 
3 students will guess the 


(using binomial rule) 


ability that 
questions correctly was Obtained as 


p(x—3)—10c, (.25)3 (75) 
=.2499 


For using the normal approximation to the binomial we 
should consider the following points 


(i) that the normal curve or distribution will have the 
Same mean and variance as the binomial variable x i.e. , 


теап = у =пр= 10х 


d -25=2.5 and 
variance = д2 


e пра=10Х 25x 751.875. 


The Normal Probability Distribution 217 


(1) the exact probability of the discrete binomial random 
variable x assuming a given value x—3 is equal to the 
area lying uader the normal curve between the two 
ordinates x, —2.5 and x,— 3.5, since a discrete value 
x—3 will approximately represent the interval 2.5 to 3.5 
on a continuum. With these considerations our problem 
reduces to — 


х is a normal variable with mean 2.5 and S.D.—c =y 1:875. 
Find the probability e(x1<x<x,)=[2.5<x<3.5] 


Now consider, 


оа і [Эде ене = BO 
e[2.5—x 3.5] e[ =H дае Fo M ] 
= ASAS «253 53:95059 
E: IC 
[using transformation in (8.3)] 
== [0<z<0.73] 
=0.2673 (from table A for z=0.73) 


Used transformation and expressions will be more clear 
from Fig. 8.26. 


оона ea 
2 ‚су, 
Fig. 8.26. Shaded area p (2.5<x<3.5)=p (0<z<0.73). 


Remark. In this example we obscrye that the normal 
approximation does not provide a iclose probability value. 
Here the exact probability is:0:2499 while approximate pro- 
bability comes out.tobe 0.2673. It is because of the fact thta 
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neither n is very large nor p is close to 1 in the present example. 
For a satisfactory approximation let us consider another 
example. 


Example 8.10. Find the probability of getting between 3 and 
6 heads inclusive in 10 tosses of a fair coin by using.(a) the 


binomial distribution, (b) the normal approximation to the 
binomial distribution. 


Solution 
Here, n=10, p=}, а=} 
(a) p(x)=ncx (5)* G= using (7.13) 


The required probability is 


6 6 
PG&x«6- У рбд= XI 
x=3 x=3 


6 
= > 10с, (4)! 
х=3 


= dy [60 + 105+ 126+ 105]=0.7734 


(b) For investigating the normal approximation, we first 
find 


mean==“=np=10* 4=5.0 
variance=o7=npq=10x}x4=2,5 
Thus, ХЕМ (5.0, 2.5) 


.5 


and z= X250 UN (0, 1) 
У 2 | 


{using the transformation (8.3)] 


Assuming the variable as continuous, 3to 6 heads can be 


considered as 2.5 to 6.5 heads, Thus, we need to find 


P[2.5x«:65)-p [25—50.,.. x-50 _ 65—50 
Пт (12827 128 | 


FEZ | у, | 
{using the transformation (8.3)] 
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= P[ —1.58 <z-<0]+ P[0<z<0.95] 
=P[0-<2<1.58]+P[0<z<0.93] 
=0.4429 4-0.3289 [using table A] 
=0.7718 


which compares well with the exact value 0.7734 as obtaind by 
using binomial distribution. The approximation will be more 
close for large n. 

Used transformations and expressions in part (b) are more 
clarified in Fig. 8.27. 


x-50 
158 
Fig. 8.27. Shaded arca p (2.5 «x «6.5) =р (—1.58 <2<0.95) 


-158 `0 095 Z= 


Example 8.11. A multiple-choice paper has 300 questions, 
each with 4 possible answers of which only 115 the correct 
answer. What is the probability that sheer guesswork yields 
from 25 to 30 answers for 50 of the 300 problems about which 
the student has no knowledge ? ў 


Solution 
Let x represent the No. of correct answer. 3 
p=}, the probability of а correct answer by guessing for 


each of the 80 n=80 questions. 
For using the normal approximation, we calculate 
p. mean up 80 х j= 20.0 
уапапсе==а2=пра =80Х 1 x 1—15.0 
Thus xN(20, 15) t i 
;.X—20 ~ М0, 1) [using the transformation (8.3)] 


= == 


220 Elementary Statistics in Psychology and Education 
Assuming the variable as continuous, guessing from 25 to 
30 questions correctly may be considered as 24.5 to 30.5 
questions. Thus the required probability is 


387 <7" 337 <. 387 
*P[1.16-— 222.71] [using the transformation (8.3)] 
= P[0-— 22.71] - P(0- 1.16] 
0.4966 — 0.3770 
--0.1196 


Р(25:5-:х4:30:5)--Р [245-20 8ے‎ 20. 30590 


Used expressions are clarified in Fig. 8.28. 


Р 
х 220х245 x-305' 
i 
+ 4 
00135 27i ,/2-255 
Fig. 8.28. Shaded arca is р (24.5<x<30.5)=p (1.16«z« 2.71). 


In short, we can say that the normal 


binomial is excellent when n is large and fairly good for 
moderately large values of n if р is close to 1. 


approximation to the 


Problem Set 8 


Define a normal probability distribution. 


Write down the mathematical form of a normal curve. 


Mention some of the important Properties of a normal 
curve. | 


4. What is a standard normal variable, 


State the conditions under Which the binomial distribution 
can be approximated to the normal distribution. 
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6. 


10. 


For a normal variable x with mean и--40 and с--6, find: 


(i) the area above 27 


(ii) the area between 42 and 51 


(11) the point that has 452, of the area below it. 


Given the normally distributed variable x with mean 18 
and S.D. 2.5, find: 


(1) Р(х< 15) (ii) Р(17<х< 21) (iii) the value of a such 
that P(x<a)=0.2578. 


Determine the value of z in the following cases using the 
area Table A. 


(i) the area between 0 and z is 0.3770 
(ii) the area to the left of z is 0.8621 
(iii) the area between —1.5 and z is 0.0217. 


The mean mark ona final examination was 72 and the 
S.D. was 9. The top 10% of the students are to receive 
A's. What is the minimum mark a student must get in 
order to receive an A? 

In a mathematics examination the average grade was 82 
and S.D 5. All students with grades from 88 to 94 receiv- 
ed a grade of B. If the grades are approximately normally 
distributed and 8 students received a B grade, how many 
students took the examination? 


4 
Find the error in calculating Z р(х) by using the normal 


x=1 
approximation to the binomial distribution where n=20 
and p=0.1. 
Find the probability that a student can guess correctly the 
answers to (i) 12 or more out of 20, (ii) 24 or more out of 


40 questions on a true-false examination. 


(Hint: Use normal approximation to binomial distribu- 
tion. The probability of guessing correctly ona 
true false test is 1/2.) 
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A coin is tossed 400 times. Use the normal approxima- 
tion to find the probability of getting: 


(i) Between 185 and 210 heads inclusive. 
(ii) Exactly 205 heads. 


Answers to Problem Set 8 


(i) 0.9849 (ii) 0.3362 (iii) 39.24 

(i) 0.1151 (ii) 0.5403 (iii) a= 16.38 
(i) z— 1.16 (ii) z—1.09 (iii) z= —1.35 
84 

62 

0.0018 


(i) 0.2511 (ii) 0.1342 


. (i) 0.7925 (ii) 0.0352 
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Appendix | 


Table A Areas under the standard normal curve. 
Table В Logarithms. 
Table C Antilogarithms. 
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Table B 
LOGARITHMS 
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Table B 
LOGARITHMS 
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. Table C 
ANTILOGARITHMS 


1005|1007 
1028 | 1030 
1052 | 1054 
1076 | 1079 
1102 | 1104 


1294 | 1297 
1324 | 1327 


л toL Lon эд 4 ہی دیا دی لھ‎ ww 


WUWU WW мүн ده ده ډه‎ юу 
Ф Ф ما دیا‎ бо یں دیا دیا پیا دیا‎ uuo 
Ава юэ (әсә 
суо Ое пу OO Ot Ann n 


ee А e 


Appendix 


1"7| 3 


317013177 


314313251 
3319 | 3327 


ANTILOGARITHMS 


Table C 


5433 | 5445 


5689 | 5702 
582115334 
5957 | 5970 
6095 | 6109 
6237 | 6252 
6383 | 6397 
6531 |6546 

6699 


9226 |9247 
9441 |9462 
9661 (9683 
9886 | 9908 


112 


247 


000003 IYI ОХО Ф Ф OS Anus VU t^ hh OAD A 
№ ЮФ м 0006000 со O0 00-1 NS NN AN AAAA HNN Gian A 


од: 13 


231 


~ 
© 
© 


O Охо OOOO со Coco 0800-3 SIA з 


SOO 00 MOON мала лама осоо © 


Оо 00 бо сома да NNNNA AMADA Ota «л 


1011 13 
101113 
101213 
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1315 17 
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1517 19 
151719 
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APPENDIX 2 


In obtaining probabilities of complex events the counting of 
cases (favourable and exhaustive) is often difficult. To facilitate 
the labour involved, a few counting rules are discussed below: 


Fundamental rules of counting 

Rule 1. If ап event can happen in any one of m ways and 
if when this has occurred another event, can happen in any one 
of n ways, then the number of ways in which both cvents can 
happen in the specified order is mx n—mn. cum 

Example 1: If there are 3 candidates for Presidentship and 
4 for vice-presidentship in a union election of a certain college. 
In how many ways the two offices can be filled. 

The office of the president can be filled by any of the three 
candidates. For each of these 3 ways the office of the vice 
president can be filled in 4 ways. Therefore, the two offices can 
be filled in 3: 4&12 ways. 

Example 2. Count the number of exhaustive cases when a 
pair of dice is thrown once. 

The first dice can land in any one of six ways. For each of 
these six ways the second can also land in 6 ways. Therefore, 
the pair of dice can land in 6X 6=36 ways. 

Rule 2. If an event A can occur in total of m ways and if 
a different event B can occur in n ways, then the event A or B 
can occur in m+n ways ek the n events are mutually 

“Чисїүс (can not occur simultaneously). л 
come 3. In a certain class a class Да гэн to be 
chosen from 2 female and 3 male candidates. Count the ways 


in which a class representative can be chosen. | 

Here a female representative can be chosen in 2 ways and a 
male in 3 ways. Therefore, the number of ways in which a 
class representative can be chosen will be 2+3=5. 


за 


~ 
eS 
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Example 4. A bag contains б red, 4 white and 3 blue balls. 
Count the number of cases in which a drawn ball at random is 


either red or white. 
A red ball can be drawn in 6 cases and a white in 4 Cakes 
Therefore, the required number of cases will be 64-4== 10. ` 


Factorial Symbol 

In the following rules we will observe that the products of 
consecutive integers are involved. We represent this product by 
a factorial symbol. For example, the product 5x 4X 3X2X is 
written as 5! and referred to as ‘5 factorial’. In general, for any 
positive integer n, the product n(n—1)(n—2) . . . (3)(2)(1) is 
represented by the symbol n! » which is read as ‘n factorial’. 
By definition 1120!— 1. 


Permutation 
A permutation is an arrangement of all or part of a set oJ 


objects. Consider the three letters a, b and c. The possible 
permutations of these three letters are abc, acb, bac, bca, cab, 
cba. Thus, we arrive at 6 different arrangements of three letters 
or objects. Using rule 1, we could have arrived at the result 
without actually writing the diferent orders. Here, there are 3 
lled from the three letters. Thus, we have 
2 for the second, leaving only 1 
3x2x1-6 permutations. 
ofn distinct objects will 


positions to be fi 
3 choices for the first position, 
for the last position, giving а total of 
In general, the number of permutations 


be 


n(n—1)(n -2)... (3)(2)(1)--4! (1) 


Permutations of п objects taken r at a time 
The number of permutations of the three letters a, b and c 


will be 31—6. Let us consider now the number of permutations 
that are possible by taking the 3 letters 2 ata time. These per- 
mutations would be ab, ac, ba, ca, be, cb. Applying rule 1 
again, we have 2 positions to fill with 3 choices for the first and 
2 choices for the second, i.e., a total of. 3Х2=6 permutations. 
In general, n distinct objects taken r at a time can be arranged in 
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n(n—1)n—2)...(n—r4- 1) ways. This Product is represented 
by the symbol 
! 
nPr-n(n—1)(n—2)... (п—т--1) -— P. (2) 


Example 5. How many w 
get on a bus. 
Using (1), 


ays 5 students be lined up to 


the total number of such Permutations would be 
5-5х4х3х2х1-4120 


Example 6. How many ways can the 4 starting positions in 


a team be filled with 9 students who can play at any of the 
Positions. 


Here it is a Problem of a 


Tranging 9 students taking 4 at a 
time. Using (2), we have 


BESE _ 9x8x7x6x5 
АДЕП CE NOE mS 
= 3024, 


Example 7. Two 
and second prizes, 
of the tickets. 


The total number of such arrangements will be 


lottery tickets are drawn from 25 for first 
Count the total number of arrangements 


25! 25x 24x 23! 
5Р2= Tale э, Ол > 


23! 


aea (3) 
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Cn 


Example 8. The combination of letters a, b, c taken two 
at a time is Я 
3! 


з©з=-эгт, 


=3, 

Example 9. From 5 boys and 6 girls find the number of 
committees of 3 that can be formed with 2 boys and 1 girl. 

The number of ways of selecting 2 boys out of 5 is 


5! З 
sC2= Fy 31 =10 x [using (3)] 


Similarly one girl out of six can be selected in 


! 
sci m =6 ways. 


15 


Using гше 1, the number of committees that can be formed 
with 2 boys and | girl will be 10x 6= 60. 
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Example 4. А bag contains 6 red, 4 white and 3 blue balls. 
Count the number of cases in which a drawn ball at random is 
either red or white. 
A red ball can be drawn in 6 cases and a white in 4 cases. 
Therefore, the required number of cases will be 6--4--10. 


Factorial Symbol 
In the following rules we will observe that the products of 


consecutive integers are involved. We represent this product by 
a factorial symbol. For example, the product 5X4x3x2x1 is 
written as 5! and referred to as ‘5 factorial’. In general, for any 
the product n(n— 1(n—2)... (3)(2)(1) is 


positive integer n, 
, which is read as ‘n factorial’. 


represented by the symbol n! 
By definition 1!=0!=1. 


Permutation 


A permutation is an arrangement ofvall or part of a set ој 


objects. Consider the three letters a,b апіс. The possible 
ations of these three letters are abc, acb, bac, bea, cab, 
s, we arrive at 6 different arrangements of three letters 
Using rule 1, we could have arrived at the result 
y writing the difierent orders. Here, there are 3 
positions to be filled from the three letters. Thus, we have 
3 choices for the first position, 2 for the second, leaving only 1 
for the last position, giving а total of 3x 2x 1—6 permutations. 
In general, the number of permutations ofn distinct objects will 


be 


permut 
cba. Thu 
or objects. 
without actuall 


n(n—1)(n -2)... (30 ) ==! (1) 


ations of n objects taken r at a time 
utations of the three letters a, b and с 


will ђе 31—6. Let us consider now the number of permutations 
that are possible by taking the 3 letters 2 at a time. These per- 
mutations would be ab, ac, ba, ca, be, cb. Applying rule 1 
again, we have 2 positions to fill with 3 choices for the first and 
2 choices for the second, i.e., a total of 3x2—6 permutations. 
In general, n distinct objects taken r at a time can be arranged in 


Permut 
The number of perm 
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n(n—1)(n—2) Bes (n—r4- 1) ways. 


This product is represented 
by the symbol 


nPr—n(n—1)(n—2) , . . (n—r4-1)— 11. 


(n—r)! (2) 

Example 5. How m 
get on a bus. 
Using (1), 


any ways 5 Students be lined up to 


the total number of such Permutations would be 
5-5х4х3х2х 1=120 
Example 6. How ma 


a team be filled wi 
positions, 


Here it is a Problem of 


arranging 9 students taking 4 at a 
time. Using (2), we have 


= 3024, 


Example 7, ЈЕ 
and second prizes, 
of the tickets, 


The total number of such arrangements will be 


Wo lottery tickets are drawn from 25 for first 
Count the total number of arrangements 


D 25! Бе 25х24х23! 
RUE Er 5 
=600. 
Combinations 


We observed that 


n! 


ачат (1-1) в) 
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сл 


Example 8. Тһе combination of letters a, b, c taken two 
at a time is 


geo =3. 


mn 


Example 9. From 5 boys and 6 girls find the number of 
committees of 3 that can be formed with 2 boys and 1 girl. 
The number of ways of selecting 2 boys out of 5 is 


;C2=—-,, =10 [using (3)] 


Similarly one girl out of six can be selected in 


6! 
«Сү TET —6 ways. 


Using rule 1, the number of committees that can be formed 
with 2 boys and 1 girl will be 10x 6=60. 
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Abscissa, 45 

Actual class limits, 25 

Addition rule of probability, 169° 

a-priori probability, 162 

Arithmetic mean, 62 

Array, 20 

Asymmetrical curve 
moderately, 52 
extremely, 54 

Averge, 61 

Average deviation, 98 


Bernoulli trial, 182 
Bimodal distribution, 83 
Binomial probability distribution, 
182,184 
normal approximation, 216 
Binomial frequency distribution, 185 
Bivariate distribution, 127, 141 
Bivariate table, 36 


Caption, 32 
Central value, 61 
Class frequency, 26 
Class interval, 21 
Classification 
dichotomous, 19 
manifold, 20 
qualitative, 19 
quantitative, 18, 19 
simple, 19 
Coefficient of correlation, 130 
Coefficient of relative dispersion, 120 
Coefficient of variation, 120 
Complementary event, 163 


Compound or joint events, 161 
Conditional Probability, 127 
Continuous variable, 16 
Continuous probability 

distribution, 179 
Correlation, 127 

curvilinear, 128 

linear, 128 

negative, 128 

non-linear, 128 

positive, 128 

rank, 146 
Covariation, 127 


‘Cumulative frequency curve, 55 


less than, 56 

more than, 57 
Cumulative frequency 

distribution, 28 

less than, 29 

more than, 29 * 
Curvilinear correlation, 12! 


Data 
primary, 33 
secondary, 33 
Dependent events, 162 
Discontinuous variable, 16 
Discrete distribution, 179 
Discrete variable, 16 
Dispersion, 96 
Dot diagram, 129 


Events, 160 
complementary, 163 
dependent, 162 
equally likely, 160 
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independent, 162 
joint, 161 
mutually exclusive, 160 
simple, 161 

Exclusive cases, 22 


Favourable cases, 161 
Frequency, 
curve, 51, 52 
distribution, 21 
Polygon, 49 
table, 21, 27 


Gaussion distribution, 197 
Graphical Presentation, 43 


Histrogram, 45 


Inclusive classes, 22, 24 
Independent events, 162 
Interval scale, 17, 18 


Joint event, 161 

J-shaped curve 
negatively, 54 
Positively, 54 


Karl Pearson's Coefficient о. 
, correlation, 129, 130 


Linear correlation, 128 
perfect, 128 


‚ Manifold classification, 20 * 
Manifold table, 37 

Mean, 62 

Mean deviation, 98 

Measures of central tendency, 61 
Measures of dispersion. 96, 97 
Measures of location, 61 
Median, 76 

Mid-point, 28 

Mode, 82 

Modal class, 86 


Index 


Modal value, 86 

Multiplicative rule of probability, 
169, 171, 14 

Mutually exclusive events, 160 


Nominal number, 17 

Non-linear correlation, 128 

Normal curve, 196 

Normal variable, 197 
Standard 202 


Ogive, 55 
less than, 56 
more than, 57 
One way table, 35 
Ordinal number, 15 
Ordinate, 45 


Parameters, 7 
Perfect linear correlation, 128 
Population, 5 

finite, 5 

infinite, 6 
Position average, 76 
Positive correlation, 128 
Primary data, 33 
Probability, 159 

a-priori, 162 

classical, 162 

empirical, 164 
Probability distribution, 179 

theoretical, 179 
Probability function, 184 
Product moment correlation 

coefficient, 129 


Qualitative classification, 19 
Qualitative variable, 7 


Quan titative classification, 19 
Quantitative variable, 7 


Range, 97 

Rank Correlation, 146 
Ratio scale, 17 

Raw data, 19 
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Relative frequency distribution, 29 
Root mean square deviation, 104 


Sample, 6 
Sampling statistic, 8 
Scale, 
interval, 17, 18 
ratio, 17 
Scatter diagram, 129 
Scatteredness, 96 
Secondary data, 33 
Simple event, 161 
Simple table, 35 
Skew curve, 52 
negatively, 53 
positively, 53 
Standard deviation, 104 
Standard. normal variable, 202 
Statistic, 3, 8 
Statistical inference 2, 8 
Statistics, 1, 2, 3, 14 
descriptive, 2 
2,liatnerefni 


Stubs, 32 
Symmetrical bell shaped curve, 52 


Table, 29 

Tabulation, 29 

Theoretical frequency distribution, 
180 

Theoretical probability distribution, 
179 

Trial, 160 

True class limit, 22 

Two way table, 36 


Unfavourable cases, 161 
Ungrouped data, 19 
Unimodal, 82 

U-shaped curve, 54 


Variable, 7, 14 
random, 178 

Variability, 15 

Variates, 15 


Of allied interest 


ABNORMAL PSYCHOLOGY 
—S К. Mangal, 1987, 272рр 


The present book provides a workable base for the undef 
standing of basic concepts of abnormal behaviour and abnormal 
psychology. Starting with the concept, nature and background * 
of abnormal behaviour and abnormal psychology, it takes up 
the types of abnormalities and disorders of human behaviour 
and suggests possible treatment by combining physical as well 
as socio-psycological therapeutic measures. Written in a simple 
but well-organised style, it will prove beneficial not only to the 
students of abnormal psychology and mental health of the 
graduate and post-graduate courses, but will also be useful to 
parents and teachers for understanding and improving their 
own mental health as well as that of the people whose welfare 
is entrusted to them. 


S.K. Mangal teaches at C.R. College of Education, Rohtak. 


EDUCATIONAL PSYCHOLOGY 
—C.L. Kundu and D.N. Tutoo, 1988, 624pp 
Fifth Revised and Enlarged Edition 


This fifth revised and enlarged edition of Educational Psy- . 
chology focuses on intellectual learning and development on’, 
the one hand and educational and teaching practices on the 
other. It provides for a thorough coverage of the psychological ~ 
theory and research that underlie educational practice. The: 
authors have maintained an electic approach and define tlie 
field of educational psychology, drawing widely from studies 
conducted by educational researchers, sociologists, anthropolo- 
gists, educationists and linguists. 


C.L. Kundu is Dean, Faculty of Education, Kurukshetra University. 


D.N. Tutoo, М.А., M.Ed., Ph.D., is engaged in research on testing 


personality and evaluation at the Defence Institute of Psychological 
Research, New Delhi. 
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